## Wednesday, February 20, 2013

### [leetcode] Implement strStr() | String matching algorithms

Problem:
Returns a pointer to the first occurrence of needle in haystack, or null if needle is not part of haystack.
e.g. haystack=>"mississippi", needle=>"issip".  return "issippi"

This is not a difficult problem. A first thought of brute force algorithm is easy to come up with. However, it gets very tricky if we want to improve the efficiency.

Knuth-Morris-Pratt(KMP) Algorithm
See details at wiki and here
```public class StrStr {

public String strKMP(String haystack, String needle){
if(needle.length()==0)
return haystack;
//get next array
int[] next = generateNext(needle.toCharArray());
int len = 0;     // Number of characters matched

for( int i=0; i<haystack.length(); i++ ){
while( len>0 && needle.charAt(len)!= haystack.charAt(i) )
len = next[len];
if( needle.charAt(len) == haystack.charAt(i) )
len++;
if( len == needle.length() )
return haystack.substring(i-needle.length()+1);
}

return null;
}
/**
* KeyPoint. Get next array.
* @param needle
* @return
*/
private int[] generateNext(char[] needle){
//next array. next[i] means when match length = i, the common prefix and real suffix length = next[i]
int[] next = new int[needle.length+1];
//the longest common length of  prefix and real suffix.
int presufix = 0;
//next[1] = 0, starting from 2. i = match length.
for(int i=2;i<next.length;i++){
while(presufix>0 && needle[presufix]!=needle[i-1])//trickiest part
presufix=next[presufix];
if(needle[presufix]==needle[i-1])
presufix++;
next[i]=presufix;
}
return next;
}

}
```

Rolling Hash Algorithm
See details at wiki. An important application of rolling hash is Rabin-Karp Algorithm. While Rabin-Karp algorithm is a good choice for multiple pattern search , I borrowed the idea to solve this problem. Although it is slower than KMP algorithm in worst case in terms of time complexity, it is much easier to understand than KMP.
The hash function H is:

Where a is constant and c1..ck are input chars.
Here's a simple example of rolling hash.
Input: char array A[1...10]. Length k=5.
H[1..5] = A[1]*a^4 + A[2]*a^3 + A[3]*a^2 + A[4]*a^1 + A[5]*a^0
Then how can we compute H[2..6] ?
Instead of computing it from scratch, we can do as following based on H[1..5]
1. Remove first item of H[1..5] :  H[2..6] = H[1..5] - A[1]*a^4
2. Times constant a                  :  H[2..6] *= a
3. Add  last item A[6]              :  H[2..6] += A[6]

Based on this idea, I coded a very naive implementation to solve the problem. Note that to avoid manipulate huge H values, all maths are done by mod in reality.
```public class StrStrRollingHash {
public static final int A = 3;

public String strStr(String haystack, String needle) {
int n = haystack.length();
int m = needle.length();
if(m==0)
return haystack;
if(n<m)
return null;

char[] input = haystack.toCharArray();
int needleHash = getHash(needle.toCharArray(),0,m-1);
int haystackHash = getHash(input,0,m-1);

for(int i=0;i<n-m;i++){
if(haystackHash==needleHash){
if(haystack.substring(i, i+m).equals(needle))
return haystack.substring(i);
}
//rolling hash
haystackHash-=(int)(input[i]*Math.pow(A, m-1));//remove i th item
haystackHash*=A;//times A
}

if(haystackHash==needleHash){
if(haystack.substring(n-m, n).equals(needle))
return haystack.substring(n-m);
}

return null;

}

private static int getHash(char[] array,int start,int end){
int k=end-start;
int sum=0;
for(int i=start;i<=end;i++){
int item = (int)(array[i]*Math.pow(A,k--));
sum+=item;
}
return sum;
}
}

```

You can try this problem on leetcode.
There's another related Boyer-Moore Algorithm, I will study it when get free.

## Monday, February 18, 2013

### 交换使两个序列差最小| Two Equal-sized Subsets Partition Problem

1. 穷举法。

2. DP 算法。

```public class PartitionArray {

/**
* @param array:
*        assuming that the input array is the combination of two n-sized array.
* @return int :
*        minimal difference of the two n-sized subsets.
*/
public static int minimalSum (int[] array){
int total = sum(array);
int sum = total/2;
int n=array.length/2;

//dp[i][v] = true means that there's i elements added up to value v.
boolean[][] dp = new boolean[n+1][sum+1];
dp[0][0]=true;

for(int k=1;k<=2*n;k++){
int val = array[k-1];
for(int i=1;i<=k&&i<=n;i++){
for(int v=1;v<=sum;v++){
//try to take k as i th element
if(v>=val && dp[i-1][v-val])
dp[i][v]=true;
}
}
}
//find the answer. we need n-sized subset, so just check dp[n][*].
for(int i=sum;i>0;i--){
if(dp[n][i])
/**
* we find a subset with size=n and sum=i,another subset will have sum=total-i
* so the difference will be (total-i)-i=total-2*i
*/
}
return -1;
}

private static int sum(int[] array){
int sum=0;
for(int i:array)
sum+=i;
return sum;
}
}
```

## Tuesday, February 5, 2013

#### Talk is cheap. Show me the code.

--Linus Torvalds in Linux-kernel mail list

## Monday, February 4, 2013

### Notes on LCS, LIS, and related problems

LCS (Longest Common Subsequence) is a well-known dynamic programming(DP) problem. I learned it several times, but sadly I forgot it several times as well. These kinds of problems are not that straightforward for someone first encounters it. While I'm learning(to be more specific, re-learning) these related problems recently, I still haven't found any short path to master DP algorithms. Personally, I think completely understanding more and more problems will enlarge our knowledge to solve other similar problems in the future. And of course, coding is inevitable and essential.

Circus Tower Sorting Problem
There are many interesting problems related to LCS problem. For example:
A circus is designing a tower routine consisting of people standing atop one another’s shoulders. For practical and aesthetic reasons, each person must be both shorter and lighter than the person below him or her. Given the heights and weights of each person in the circus, write a method to compute the largest possible number of people in such a tower.
EXAMPLE:
Input (ht, wt): (65, 100) (70, 150) (56, 90) (75, 190) (60, 95) (68, 110)
Output: The longest tower is length 6 and includes from top to bottom: (56, 90) (60,95) (65,100) (68,110) (70,150) (75,190)

This problem can be easily converted to LCS problem and therefore solution is quite simple:
1. Sort the data according to height. Denote it as D1. That is O(nlogn).
2. Sort the data according to weight. Denote it as D2. That is O(nlogn).
3. Find the length of the LCS of D1 and D2. Using dynamic programming, it’s O(n^2).
Given a good understanding of LCS problem, this problem can be solved quickly, though it's tricky.
1. Sort the Sequence in increasing order. Denote it as S'. That is O(nlogn).
2. Find the length of the LCS of original sequence S and S'. Using dynamic programming, it’s O(n^2).
This reveals the overall O(n^2) time complexity solution. Another  O(n^2) DP solution is shown here. There's exists O(nlogn) solution to solve it and that's not hard either. They can be found on wiki and a good example is shown on a Chinese blog.

The boxes in this particular problem can be rotated and multiple instance of same box is allowed. This problem can be solved based on the above LIS problem.

There's another box stacking problem which is slightly different from the above one in Cracking the Coding Interview 5th Edition Chapter 9.10 :

You have a stack of n boxes, with width w(i), depth d(i), and height h(i). The boxes cannot be rotated and can only be stacked on top of one another if each box in the stack is strictly larger than the box above it in width, height, and depth. Implement a method to build the tallest stack possible, where the height of a stack is the sum of the heights of each box.

My initial thought is to adapt a similar approach as the Circus Tower problem. However, sorting the boxes in width, depth and height separately and find their LCS is kind of 3-Dimension DP, which is much more complicated. Actually I don't if it will truly work.

While answer in the book is taking the similar idea of solving LIS problem, I don't understand it well, and I don't like the code either. So I just put my idea here:

1. Sort the boxes on increasing order of height.
2. Let:
H(i) = Maximum possible Stack Height with box i at bottom of stack
H(i) = { Max ( H(j) ) + height(i) }
where j is in [1..i-1] and width(j) < width(i) and depth(j) < depth(i) and height(j) < height(i)
If there is no such j then H(i) = height(i).
3. Scan H(1) to H(n). Find the max as the answer.

I may put the code later.