## Archive for the ‘algorithm’ Category

### Minimum Spanning Tree

December 7, 2010

Well,

The other day I came across this problem and ramprasad gave a brilliant solution to it.

Problem : Given a connected weighted graph G, find a minimum spanning tree. But wait, before you jump to any conclusion. The weights of the edges are not real numbers, but a quadratic function of time. So given a time interval [t1,t2] find out the minimum spanning tree T. That is T attains the minimum at time tm.  $\forall t \in [t1,t2], \forall TR \in {Spanning Tree of G}, W(T_{tm}) \leq W(TR_t)$. Now, find out the minimum spanning tree and the time instant tm at which it attains that minimum value.

Solution : Well, the first solution that popped up in my mind was a brute force approach. Consider all spanning trees. As the weights of edges are quadratic functions, the weights of spanning trees would also be quadratic functions. Now differentiate these functions to find the minima. Now find out the absolute minimum of all these. But apparently, the number of spanning tree of a graph could be as high as $n^{n-1}$  thanks to Kirchoff. So obviously, it does not fit into our perception of “efficient algorithm”.

Ramprasad gave a nice solution which is definitely polynomial time. Not sure whether there are better solutions or not. Note that, prim’s algorithm or kruskal’s algorithm takes into account the order of weights of edge and does not really care about what these actual weights are.  So as long as the order of the weights remains unchanged, you can apply one of these algorithm. Given the time interval [t1,t2] it can give rise to infinitely many instances of the weighted graph. As the weight function of edges are quadratic ( parabola ) there are at most $O({|E| \choose 2})$ many intersection points. That is, parabola corresponding to two edges intersect at atmost 2 points.  And these intersection points are the points in time where weight of one edge goes beyond the weight of another edge.

So, sort these intersection points by time co-ordinate. Note, that between two consecutive intersection points, the order of the weights remain unchanged and hence one of the standard algorithm can be applied. So we need to run Prim’s or Kruskal’s algorithm ($O(|E| log |V|))$for each such interval giving rise to $O(|E|^3 log |V|)$ algorithm overall. Clearly, a major improvement over the brute force method.

### Shortest Path

December 3, 2010

Hi,

I am sure most of you must be aware of dijkstra’s shortest path algorithm. Now here is a bit interesting version of it.

Problem : You are given a weighted connected graph with all edges having positive weights. Some of the cities ( nodes ) have petrol pump whereas some don’t. You have a car with the tank capacity of C. That is, with full tank, the car can travel for C units of distance. At any city with the petrol pump the car can fill its tank to full. Find out the shortest distance from a given source vertex to a given destination vertex with these constraints. Find out the shortest path between the given source and destination. Find out the shortest paths and distances from the source to all vertices.

Solution : Actually, this time, I am being a bit lazy. But I have this simple idea which certainly is a solution. Probably with a scope to improve it. Given the graph, take a vertex v without a petrol pump.  Remove this vertex and all edges incident on it. Now, for all neighbours u1,..,uk of it, add/update edges between (ui,uj). These edges can now store the shortest path between (ui,uj). Once the graph is converted in this manner, the standard algorithm can be applied. The conversion of this graph can be done in $O(n^3)$ or $O(n^4)$ depending upon whether we need the paths or just distances. I know I have omitted many details. But all in all, this trick can give rise to a $O(n^4)$ algorithm ( probably $O(n^3)$ not sure 😉 ) which does the job. Let me know if you can do better 🙂

### Tournament graph and Hamiltonian Path

November 11, 2010

Well,

Those of you who are unaware of tournament graph and hamiltonian path, you can look here and here. Ramprasad gave me the following interesting problem to think about.

Problem : Given a tournament ( I will omit the suffix “graph” as it is obvious in the context of graph theory ), construct a hamiltonian path in polynomial time.

Solution : When I first heard the problem, I was even wondering how to prove first that all tournament indeed always have an hamiltonian path. And that too, we need to find it in polynomial time..Phew! But I believe that if you comeup with an algorithm, that itself gives you a proof!!

He gave me a hint : given that you already have explored some path $v_1,\dots,v_k$ how can you extend it for a given vertex $v$? That is, of course, if the path has not covered every vertex :-).

If there is a directed edge from $v_k$ to $v$ there is a natural extension. You just append $v$ to the path. If not, find out two consecutive vertices $v_i, v_{i+1}$ in the path such that the tournament have edges $(v_i,v)$ and $(v,v_{i+1})$ in which case the modified path will be $v_1,\dots , v_i, v ,v_{i+1},\dots, v_k$.

Suppose no such pair exists!!!! Let’s try to see why this can’t be true. Look at, $v_1,v_2$. It has to be the case that either tournament contains edges $(v_1,v),(v_2,v)$ or $(v,v_1),(v_2,v)$. If it is the second case, that $v$ can be just added as a prefix to the path. If it is the first case, than note that tournament must contain $(v_3,v)$ because other wise we can have $\dots, v_2,v,v_3,\dots$. Similarly, we can argue for $v_4,\dots$ giving us the contradiction as we had assumed that edge $(v_k,v)$ is absent from the tournament.

Hence, we can always extend the path as long as there is some vertex we have not yet covered.
Ramprasad gave a simple proof of why a pair $v_i,v_{i+1}$ must exist such that we can extend the path to contain $v_i,v,v_{i+1}$. If we can not add $v$ as a prefix or a suffix to the path. Then look at the vertex with the least index j such that $(v,v_j)$ is an edge. By the definition of the tournament, one of the two edge $(v_{j-1},v) , (v,v_{j-1})$ must be there. But, because j is the least index having an incoming edge in the path, the second edge can not be there. hence we can extend the path to contain $v_{j-1},v,v_j$.

Easy to see that algorithm runs in $O(n^2)$ time as we need to search $O(n)$ vertex to extend the path and repeat it till we  have covered all vertices.

–Saurabh Joshi

### Puzzle : Power game

October 13, 2010

Well,

I like the dramatic titles :-). Anyways, here goes the problem.

Problem : Given an integer n, determine whether $n=a^b$ for some integers a and b in poly-logarithmic time.

Solution : It is obvious that for certain n, there could be multiple a’s and b’s such that $n=a^b$. We only need to find one such a and b or have to declare that there are no such integers. Needless to say, that we are ruling out b=1 :-).

Given any n, the number of bits required is log n. Now, let us fix b = 2. For this b,  a must be having $\lceil \frac{log n}{2} \rceil$ bits. So we know the range of possible a’s that might satisfy $a^2=n$. We do a binary search on this range in O(log n) time. If we do not find a suitable a, we fix b=3 and repeat. There can be only logarithmic many such b’s. Multiplication and calculating powers of a number, both can be done in polylog time. Hence, we can determine that the given number n is of the form $a^b$ for some n or not.

Cool, isn’t it?

–Saurabh Joshi

### Puzzle : Find a duplicate

October 12, 2010

Well,

Needless to say that the puzzle was given by Ramprasad. After thinking for a while, I could figure out the solution using an old trick from one of the typical technical interview questions.

Problem : Given n element in an array ( say 0 to n-1 ). The elements comes from the set {1,..,n-1}. Clearly, due to pigeion hole principle, there will be a duplicate element. There can be more than one element which are occurring twice or more. In O(n) time, find out one duplicate element. The additional space you can use should not exceed O(log n).

Solution : At first, I thought that one can just have a bit array, storing whether we have seen an element or not. But for elements coming from 1,…,n-1 the number of bits required is O(n). Even if we consider a number representation to take O(log n), it will still require O(n/ log n) numbers to be stored. So clearly, that can not lead to a solution.

What works is as follows. Let us say you are given a singly linked list. You need to detect if the linked list has a cycle or not. You can only store constant many elements.

What you can do is have two poiters, p1 and p2. At each  step, you increment p1 by 1 ( by increment we mean that go to the next element in the linked list ), and you increment p2 by 2. If p1 and p2 ever becomes same, that means that the linked list has a cycle, otherwise, p2 will hit the end of the linked list.

It is worth noting that in this scenario, p1 and p2 will meet before p1 can finish traversing the entire cycle. Also, for any two relatively prime numbers ( n1, n2), if you increment p1 by n1 and p2 by n2, they will still meet irrespective of the length of the cycle.

This seemed like a little diversion from our original problem, but we are going to use this :-). What we will do is, we will look inside the element at index 0. Let us say this element is e1. Now we will look in the array A, A[e1] and then A[A[e1]] and so on. We can think of this mechanism forming a singly linked list. Because the element ranges from 1,….n-1, and the total elements in the array being n ( from index 0 to n-1 ) there has to be a cycle in this list. And as shown in the figure, the starting point of the cycle B, is the element which is repeated twice.  That is because, B will have two distinct predecessor e1 and e2 such that A[e1]=A[e2] = B.

So what you do is the following. As mentioned earlier, have two pointers p1 and p2. p1 will move by 1 and p2 will move by 2. Say they meet at point C. Let us say that distance AB=p, BC= q and CDB=r. How much p1 must have traversed? It will be p + q where as p2 has traversed p+q+r+q = 2(p+q) ( double than p1).
So, p = r.

So once we reach C. We will move p1 to A ( that is start from index 0 again ) and keep p2 at C. Remember that AB=p=r=CDB. Now, we will move p1 and p2 both by 1. They will meet at B and that is our duplicate element :-).

Instead of moving p1 by 1 and p2 by 2 to find out the point C, what if we move then by (n1,n2) where n1 and n2 are relatively prime? Would it work? 🙂

–Saurabh Joshi

### Puzzle : Nice puzzle on sorting!

August 11, 2010

Well,

The title is directly taken from a mail by Ramprasad, so is the puzzle :-). He in turn gives credit to Prof Meena for this nice puzzle.

Problem : Pasting the problem verbatim here from Ramprasad’s email.

Meena had given us this puzzle during this workshop and I found it
incredibly interesting.

Suppose we have 13 cards arranged in descending order: K Q J 10 9 8 7
6 5 4 3 2 1
At any move, you can take a substring of cards and move it elsewhere.
For example, 10 9 8 could be move to the right by 2 steps to get K Q
J 7 6 10 9 8 5 4 3 2 1. The goal is to minimize the number of such
moves.

It must be absolutely obvious that you can do it with 12 moves but the
very fact that I’m asking you this question means that you can do
better. How many moves do you need? 🙂

Now the obvious extension to this problem is to generalize this technique for any n.

Solution : Well, I got the solution myself, but not without the hint. And the hint is as follows.

See the snipped mailing conversation between me and Ramprasad.

> Is it the case that number of inversions should strictly decrease?

>

Actually inversions was my first guess towards proving the lower

bound. It is a very good guess but I do not know if it absolutely

necessary that inversions must go down at each step. But perhaps your

guess is right… maybe it must go down monotonically at each step.

[For those who do not know what an inversion is, a pair of indices

(i,j) is called an inversion if i < j and a_i > a_j ]

My first attempt towards the lower bound was that initially there are

n-choose-2 inversions and finally there are none. If we can show that

each move can remove only n inversions then we’d sort of have a good

lower bound. But unfortunately this is not true: take n n-1 n-2 … 1

and shift the first half and put it after the 2nd half to get n/2

n/2-1 … 1 n n-1 n-2… n/2+1. This process actually gets rid of

O(n^2) inversions and my proof attempt broke.

However there is a very similar parameter that can be used to argue

about the lower bound. And this parameter is called ‘descents’. These

are indices i such that a_i > a_{i+1}. Now clearly to begin with there

are (n-1) descents.

Claim: At every step, you can reduce the number of descents by at most

2. [prove it]

This immediately gives a (n-1)/2 lower bound. But do all steps

actually reduce the number of descents by 2? No! Take for example, the

first move. It can reduce the descents only by 1! And similarly the

last step. Thus you get the required lower bound of (n+1)/2.

And, this lower bound can be achieved!

---


Well, there you go! It was sufficient for me to get the trick after this hint. The claim is easy to prove

hence I am leaving it for interested readers ( if any 🙂 ).

The sequence for 13 cards is as follows :

K Q J 10 9 8 7 6 5 4 3 2 1
K 6 5 4 3 2 1 Q J 10 9 8 7
1 Q K 6 5 4 3 2 J 10 9 8 7
1 2 J Q K 6 5 4 3 10 9 8 7
1 2 3 10 J Q K 6 5 4 9 8 7
1 2 3 4 9 10 J Q K 6 5 8 7
1 2 3 4 5 8 9 10 J Q K 6 7
1 2 3 4 5 6 7 8 9 10 J Q K

In seven steps!! Only, seven steps are required, as mentioned earlier by Ramprasad.

In general, for any decreasing sequence

n n-1 n-2 …. 2 1
First move would be to move the second half of the array just after n.

n n/2-1 n/2 -2 …. 1 n-1 n-2 …n/2

Now, we maintain the invariant that the subarray before and including n, does not have a descent.

On the right of n, find an element a_i such that a_i < a_(i+1) ( does not have a descent ).

Move these two elements to the left of n to maintain the invariant. In fact, one can put a_i at the a_i th

place from the left. Again, it is easy to prove the correctness of this strategy.
Copying verbatim interesting notes from Ramprasad :
Some general notes on this problem. The question of finding the
shortest set of moves to sort any given permutation is not known to be
in P, and not known to be NP-hard or anything either. There are
approximation algorithms for this known and that is how Meena asked me
this question. One of the participants of the workshop had worked on
this problem and we started discussing this when we spoke about his
problem.

One question I asked Meena was, is the decreasing sequence the worst
sequence? And turns out, surprisingly, the answer is no! People ran
some computer analysis and they figured a permutation on 15 and
another on 17 which took more than 8 and 9 respectively! But people
are making the conjecture that for sufficiently large n, the worst
sequence is the decreasing sequence [and some others also believe that
‘large n’ is any n > 17 🙂 ].

— Saurabh Joshi

### Puzzle : Goof ups by GOD

September 25, 2009

Well,

Coming directly to the point. Today’s puzzle was posted by manoj and answer was given by Saket. I am just copying it verbatim assuming that things are under creative commons license :D.

Problem :

You are given n balls. There are two type of information God has given to you.
Same(i,j) if ball “i” and “j” are same.
Different(i,j) if ball “i” and “j” are different.

Since the world is imperfect, obviously God has goofed up somewhere.
example
Same(1,2), Same(2,3), Different(1,3)
Obviously this is not consistent
But,
Same(1,2), Different(2,3)
is consistent as you “believe” that God will never assert “Same(1,3)”

But as I said God has goofed up somewhere and you have to pay the
price. And that too a computing price.
Given “m” such assertions can you find if the assertions are consistent.

Can you do it in O($m^2$) time?

Can you do it in O($m logn$) time?
Can you do it in O(m) time?

Solution :

I think O($m^2$) solution is quite easy to achieve for this problem.

Every ball is represented by a vertex, when we process the statement same(i,j) we add an edge between the two corresponding vertex. We process all same(x,y) statements like this. Now for the set to be consistent, there should not be a statement different(x’,y’) such that x’ and y’ are connected. There can be O(m) such statements, we can check each statement by a simple DFS, which will again take O(m) time. Thus giving us an O($m^2$) solution.

For the O(m log n) solution we can use the same edge structure, but additionally make use of the well-known Union-find data structure. Which gives O(log n) time for union / find. So we can insert O(m) edges for all same(x,y) calls in O(m log n) into the structure + for each different(x’,y’) use find, to check if both x’ and y’ belong to the same component (there can be O(m) such statements making the total time taken O(m log n)). The total time for the whole operation is thus O(m log n).

For the O(m) solution,

I think this problem in fact is same as the one addressed by union – find. Because if we have a O(m) solution to this problem, then we will also have an O(1) time union-find solution. But since the union find has a proven lower bound of inverse ackerman function (which although not constant, is extremely slow growing), this problem will also have a lower bound of O(m  α(n)).

### Largest sum sub-sequence and sub-matrix!

November 15, 2008

Well,

I happened to bump into two old problems which I had solved sometime back. However, I never bothered to write it down anywhere which I intend to do so now.

Problem 1 : Given a sequence of $n$ integers ( +ve and/or -ve ) find a contiguous subsequence with the largest sum. By contiguous subsequence I mean if the i-th element and j-th element of the original sequence are in the subsequence for ( $i \leq j$ ) then $\forall k, i \leq k \leq j$ k-th element is in the subsequence.

Example :  sequence :  1 5 -9 2 4 3 8

desired subsequence : 2 4 3 8

Problem 2 : Given a m x n matrix A, find a submatrix of A such that sum of all the elements of the submatrix is the largest amongst all possible submatrices.

Example :

1 2 -1
-3 -1 4
1 -5 -2

Desired sub matrix :

2 -1
-1 4

Solution 1 : Well, actually, solution for this is very easy and many people know about it. Still, for those who don’t know, let me describe it over here.

Initialize max_sum = 0.

You start calculating cumulative sum from the first element,at each step you store it in max_sum if it is greater than its current value. If at any point of time the cumulative sum goes below zero, store it in max_sum if it is greater than its current value, and reset cumulatvie sum to zero. Subsequently, update max_sum only if the current running sum is greater than max_sum. At the end, max_sum will give the largest sum. What about the subsequence? Well, you can keep start_index and end_index to keep track of which contiguous subsequence contributed to max_sum.

Running time is $O(n)$ and I guess this is the best possible.

Solution 2  : Second one is a bit tricky but not too tough. In fact, we will use Solution 1 to calculate this one.

First what we do is for each row calculate cumulative sum array. That is array[i][j]  will contain sum of all element from 1 to j in the i-th  row. Time required is $O(mn)$ For all i, array[i][0] = 0

Next let us pick any two column, say p and q (with $p \leq q). Now, we will form a new array, sum_array[k] = array[k][q] - array[k][p-1]. That is, sum_array[k] will contain sum of all elements from p to q in row k. Now, apply Solution 1 to find max_sum. Remember that indexes now will give you start_row and end_row. p and q are start_column and end_column respectively. But we did it only for two particular column. To find the submatrix having the largest sum we will have to apply it for all possible column combinations$latex {n \choose 2}\$. Running time is $O(n^2 m)$

I don’t know whether this is the best possible solution, or does there exist any solution which can achieve this in complexity better than $O(n^2 m )$

Do let me know if someone have a better idea.

— Saurabh Joshi

### Cake Cutting Algorithms – 7 : Divide and Conquer

September 13, 2008

Well,

It has been long since I have written anything about cake-cutting. Anyways, in this post I am going to describe the best known deterministic algorithm for cake-cutting.

Let’s say we want to divide the cake between 4 people. We can let p1,p2,p3 mark a line which will divide the cake in two equal halves according to their own perceptions (captured by evaluation function $f_i$).

Say for example, p1,p2 and p3 marks line as shown in the figure. Because odd number of persons have cut the cake we can have a middle line. We will cut the cake at this line. Ask p4, whether the piece on the left or right is at least half according to his perception. If p4 selects the left half, then play cut and choose between p4 and p1 for the left piece and for p3 and p2 for the right piece. It is easy to work out the solution in case p4 selects the right piece.

What if there are odd number of people? Say 5. In this case, we will ask p1 to p4 to mark the lines which will cut the cake in the ration 2:3 from the left. Ask, p5 if he thinks whether the left piece is worth at least 40% or the right piece is worth 60%. As argued above, we can divide group into bunch of 2 and 3 and recurse.

So in general, if n is even, ask first n-1 people to cut the cake in the ratio of 1:1. If n is odd, ask first n-1 people to cut the cake in the ratio (n-1)/2 : (n+1)/2. And then ask which piece the n-th person thinks is worth at least its declared value.

It is easy to see that this algorithm works in time $O(nlog n)$. This is the best known deterministic algorithm for simple fair division.

It has been conjectured that the simple fair division can not be achieved in $O(n)$ with a deterministic discrete ( unlike moving-knife algorithm ) algorithm.  Many believe that the lower bound for simple fair division, where queries to evaluation function is not free, is $\Omega(n)$. However, it is an open question that remains to be answered.

— Saurabh Joshi

June 5, 2008

Note : Those who saw a flawed solution to “Puzzle : Noodles“, the post has been updated and now contains the correct solution.

This post is about a problem I solved more than 2 year back. Actually, it is a simple problem which was asked to my friend vaibhav gutpa during a technical interview conducted by amazon.

Problem : You are given a singly linked list and the length of the list is not known to you. Write a function which returns an element from the linked list uniformly randomly. You are allowed to traverse the linked list only once.

Solution : Had we been allowed to traverse the list twice, the job would have been easier, as we could just calculate the length n and then select an element with probability 1/n.

As we do not know the length of the list, we can maintain an invariant that the probability of selection of any element seen so far should be inversely proportional to the list traversed so far.

For example, we can initially compare the first two elements and select one element as a winner with probability 1/2. Now we make this winner contend with the third element. Of course, the winner should be given a bit more weightage as it has strugged to come so far. So the winner will be selected with probability 2/3 and the 3rd element with probability 1/3. Had the length of the list been only 3, you can see that winner is selected with probability (1/2)(2/3) = 1/3 or the third element is selected with probability 1/3. Similarly if there is a fourth element, we can choose the winner so far with probability 3/4 and the fourth element with 1/4.

This way we can traverse the entire list once. At the end, winner is guaranteed to have probability of selection as (1/n) if n is the length of the list.

–Saurabh Joshi