Tuesday, March 23, 2021

The vision for software products (well, for anything really) is ok to be vague, ok to be far from reality, but it needs to be actually EXCITING - something similar to "we are building a flying car" or "I have a dream" (MLK) 

If none of the customer or developer is excited about the vision, then it's just BS.

Thursday, March 21, 2013

[leetcode] Palindrome Partition II

Problem is stated here .
It is easy to find out that it's a dynamic programming problem. Certainly we can use recursion to solve it, but we will see that there're a lot of repeated computations, and therefore recursion is inefficient in this kind of situation.
DP approach isn't difficult, a good and brief explanation can be found here.

Let dp[i] = min cut for substring [i...n], where n is the length of input string.
Then dp[i] = 0, if substring[i..n] itself is a palindrome
    or dp[i] = 1+dp[j], if substring(i, j) is a palindrome ( i<j<n )

Here comes the interesting part, how do we know if substring(i, j) is a palindrome or not ?
If we check it directly by going though characters from i to j-1, it reveals a overall runtime of O(n^3) of this algorithm. The trick is using another DP:

substring(i, j) is palindrome if charAt(i) == charAt(j-1) && subtring(i+1,j-1) is palindrome.

So, this is a interesting mixed DP problem and can be solved in O(n^2).
Note that this post made a good point that it can be regarded as a variant of Matrix Chain Multiplication Problem. They gave a neat code for this problem using 2D dynamic programming algorithm which is very similar with the one to solve Matrix Chain Multiplication Problem. It makes perfect sense, but the time complexity is O(n^3).

As I talked it before, this problem can be solved in runtime of O(n^2). So what's the difference between it and Matrix Chain Multiplication Problem?  Well, by using anoter DP, we can find out if substring(i, j) is palindrome in O(1) time while we can not directly get the min multiplications of sub matrices(i,j) if we applied the same strategy to it.

Another related problem is problem 17.14 in the book Cracking the Coding Interview:
Given a text without spaces, e.g."helloworldniceday",
Find the minimal words it can be divided into. e.g. "hello world nice day" -> 4 words.
It occurs to me that this is another variant of Matrix Chain Multiplication Problem with 2-D DP at first, but actually it's much easier than I thought. 1-D DP is enough, just like the Palindrome Partition problem I talked first.
//-1 means this text can't be divided to valid words.
public static int getMinSeperateWords(String text){
		int len = text.length();
		int[] dp = new int[len];
		for(int i=len-1;i>=0;i--){
			if(Dict.isValid(text.substring(i))){
				dp[i]=1;
				continue;
			}
			int min=Integer.MAX_VALUE;
			for(int j=i+1;j<len;j++){
				if(dp[j]!=-1 && Dict.isValid(text.substring(i,j))){
					if(dp[j]+1<min)
						min=dp[j]+1;
				}
			}
			dp[i]= min==Integer.MAX_VALUE ? -1:min;
		}
		return dp[0];
}
In conclusion, problems can be related while they can also be harder or easier than the similar ones. Practice makes perfect.

Wednesday, February 20, 2013

[leetcode] Implement strStr() | String matching algorithms

Problem:
Returns a pointer to the first occurrence of needle in haystack, or null if needle is not part of haystack.
e.g. haystack=>"mississippi", needle=>"issip".  return "issippi"

This is not a difficult problem. A first thought of brute force algorithm is easy to come up with. However, it gets very tricky if we want to improve the efficiency.

Knuth-Morris-Pratt(KMP) Algorithm
See details at wiki and here
public class StrStr {

	public String strKMP(String haystack, String needle){
		if(needle.length()==0)
			return haystack;
		//get next array
		int[] next = generateNext(needle.toCharArray());
		int len = 0;     // Number of characters matched            
        
        for( int i=0; i<haystack.length(); i++ ){            
        	while( len>0 && needle.charAt(len)!= haystack.charAt(i) )            
        		len = next[len];            
            if( needle.charAt(len) == haystack.charAt(i) )            
                len++;            
            if( len == needle.length() )            
                return haystack.substring(i-needle.length()+1);
        }      
        
        return null;
	}
	/**
	 * KeyPoint. Get next array.
	 * @param needle
	 * @return
	 */
	private int[] generateNext(char[] needle){
		//next array. next[i] means when match length = i, the common prefix and real suffix length = next[i] 
		int[] next = new int[needle.length+1];
		//the longest common length of  prefix and real suffix.
		int presufix = 0;
		//next[1] = 0, starting from 2. i = match length.
		for(int i=2;i<next.length;i++){
			while(presufix>0 && needle[presufix]!=needle[i-1])//trickiest part
				presufix=next[presufix];
			if(needle[presufix]==needle[i-1])
				presufix++;
			next[i]=presufix;
		}
		return next;
	}

}

Rolling Hash Algorithm
See details at wiki. An important application of rolling hash is Rabin-Karp Algorithm. While Rabin-Karp algorithm is a good choice for multiple pattern search , I borrowed the idea to solve this problem. Although it is slower than KMP algorithm in worst case in terms of time complexity, it is much easier to understand than KMP.
The hash function H is:


Where a is constant and c1..ck are input chars.
Here's a simple example of rolling hash.
Input: char array A[1...10]. Length k=5.
H[1..5] = A[1]*a^4 + A[2]*a^3 + A[3]*a^2 + A[4]*a^1 + A[5]*a^0
Then how can we compute H[2..6] ?
Instead of computing it from scratch, we can do as following based on H[1..5]
1. Remove first item of H[1..5] :  H[2..6] = H[1..5] - A[1]*a^4
2. Times constant a                  :  H[2..6] *= a
3. Add  last item A[6]              :  H[2..6] += A[6]

Based on this idea, I coded a very naive implementation to solve the problem. Note that to avoid manipulate huge H values, all maths are done by mod in reality.
public class StrStrRollingHash {
	public static final int A = 3;
	
	public String strStr(String haystack, String needle) {
		int n = haystack.length();
		int m = needle.length();
		if(m==0)
			return haystack;
        if(n<m)
            return null;
		
		char[] input = haystack.toCharArray();
		int needleHash = getHash(needle.toCharArray(),0,m-1);
		int haystackHash = getHash(input,0,m-1);
		
		for(int i=0;i<n-m;i++){
			if(haystackHash==needleHash){
				if(haystack.substring(i, i+m).equals(needle))
					return haystack.substring(i);
			}
			//rolling hash
			haystackHash-=(int)(input[i]*Math.pow(A, m-1));//remove i th item
			haystackHash*=A;//times A
			haystackHash+=input[i+m];//add i+m th item	
		}
		
		if(haystackHash==needleHash){
			if(haystack.substring(n-m, n).equals(needle))
				return haystack.substring(n-m);
		}
		
		return null;
        
    }
	
	private static int getHash(char[] array,int start,int end){
		int k=end-start;
		int sum=0;
		for(int i=start;i<=end;i++){
			int item = (int)(array[i]*Math.pow(A,k--));
			sum+=item;
		}
		return sum;
	}
}


You can try this problem on leetcode.
There's another related Boyer-Moore Algorithm, I will study it when get free.

Monday, February 18, 2013

交换使两个序列差最小| Two Equal-sized Subsets Partition Problem

题目: 有两个序列a,b,大小都为n,序列元素的值任意整数,无序;
要求:通过交换a,b中的元素,使[序列a元素的和]与[序列b元素的和]之间的差最小。


这是微软的一道面试题,也在July整理的面试100题之中。本来我想了一下,没有什么特别好的方案,于是百度之。结果百度第一页出来的答案,都一样就算了,还是错的。吐槽无力, 然后就有了这篇中文文章。

首先很容易想到,这问题的本质,就是将一个长度为2n的数组,分成长度为n的两个子数组,使它们和的差值最小。这是数组分隔问题的一个特例。见Partition Problem。在原问题中,原数组允许被分为两个任意长度的子数组(只要它们合起来为2n)。然而有了子数组长度必须为n这个条件后,这个问题难度并没有降低--它还是一个NP问题。证明见这里

网上搜出来的很多解法,其实不能说全错。但它们不能保证始终得到最优解。大多数方法用的都是贪婪算法的思想,然而贪婪算法是以局部最优解去揣测全局最优解--可能成功,也可能失败。贪婪算法获取全局最优解的典型例子比如Dijkstra最短路径算法以及求最小生成树的kruskal算法。 但其有效性需要严格的证明,当然,证明其失败的话,一个反例即可。网上的大多数算法,不能保证总是最优解,但有一些确实是优秀的渐近算法,能保证结果于最优解的差距在一定范围以内。关于这些算法可以参考这里

那么,如何得到最优解?
1. 穷举法。
天下武功,唯快不破。算法么,只要够暴力,总是能解的。- -!
问题本质是从2n个元素中取出n个元素,使其和最接近于Sum/2,其中Sum为2n个元素的总和。
所以只要枚举出2n个元素中n个元素的所有组合,迭代更新最接近的值,总和得到最好的解。当然,这个算法的时间复杂度就是阶乘级别的了,不过对于一个NP问题,这也不丢脸呵。

2. DP 算法。
详见编程之美209页。
public class PartitionArray {
 
 /**
  * @param array:
  *        assuming that the input array is the combination of two n-sized array.
  * @return int :
  *        minimal difference of the two n-sized subsets.
  */
 public static int minimalSum (int[] array){
  int total = sum(array);
  int sum = total/2;
  int n=array.length/2;
  
  //dp[i][v] = true means that there's i elements added up to value v.
  boolean[][] dp = new boolean[n+1][sum+1];
  dp[0][0]=true;
  
  for(int k=1;k<=2*n;k++){
   int val = array[k-1];
   for(int i=1;i<=k&&i<=n;i++){
    for(int v=1;v<=sum;v++){
     //try to take k as i th element
     if(v>=val && dp[i-1][v-val])
      dp[i][v]=true;
    }
   }
  }
  //find the answer. we need n-sized subset, so just check dp[n][*].
  for(int i=sum;i>0;i--){
   if(dp[n][i])
    /**
     * we find a subset with size=n and sum=i,another subset will have sum=total-i
     * so the difference will be (total-i)-i=total-2*i
     */
    return total-2*i;
  }
  return -1; 
 }
 
 private static int sum(int[] array){
  int sum=0;
  for(int i:array)
   sum+=i;
  return sum;
 }
}
当数组之和非常大的时候,DP算法的数组就会非常大,就不那么高效了。
对于一些极端例子,比如A=[ 12345,12332,14098], B=[ 32213,12321,23132]. DP还不如暴力破解。C(6,3) = 120。 总共才120种可能。

Tuesday, February 5, 2013



     

   Talk is cheap. Show me the code.


                                   --Linus Torvalds in Linux-kernel mail list                          

Monday, February 4, 2013

Notes on LCS, LIS, and related problems

LCS (Longest Common Subsequence) is a well-known dynamic programming(DP) problem. I learned it several times, but sadly I forgot it several times as well. These kinds of problems are not that straightforward for someone first encounters it. While I'm learning(to be more specific, re-learning) these related problems recently, I still haven't found any short path to master DP algorithms. Personally, I think completely understanding more and more problems will enlarge our knowledge to solve other similar problems in the future. And of course, coding is inevitable and essential.

Circus Tower Sorting Problem
There are many interesting problems related to LCS problem. For example:
A circus is designing a tower routine consisting of people standing atop one another’s shoulders. For practical and aesthetic reasons, each person must be both shorter and lighter than the person below him or her. Given the heights and weights of each person in the circus, write a method to compute the largest possible number of people in such a tower.
EXAMPLE:
Input (ht, wt): (65, 100) (70, 150) (56, 90) (75, 190) (60, 95) (68, 110)
Output: The longest tower is length 6 and includes from top to bottom: (56, 90) (60,95) (65,100) (68,110) (70,150) (75,190)

This problem can be easily converted to LCS problem and therefore solution is quite simple:
  1. Sort the data according to height. Denote it as D1. That is O(nlogn).
  2. Sort the data according to weight. Denote it as D2. That is O(nlogn).
  3. Find the length of the LCS of D1 and D2. Using dynamic programming, it’s O(n^2).
Given a good understanding of LCS problem, this problem can be solved quickly, though it's tricky.
  1. Sort the Sequence in increasing order. Denote it as S'. That is O(nlogn).
  2. Find the length of the LCS of original sequence S and S'. Using dynamic programming, it’s O(n^2).
This reveals the overall O(n^2) time complexity solution. Another  O(n^2) DP solution is shown here. There's exists O(nlogn) solution to solve it and that's not hard either. They can be found on wiki and a good example is shown on a Chinese blog.

The boxes in this particular problem can be rotated and multiple instance of same box is allowed. This problem can be solved based on the above LIS problem.

There's another box stacking problem which is slightly different from the above one in Cracking the Coding Interview 5th Edition Chapter 9.10 :

You have a stack of n boxes, with width w(i), depth d(i), and height h(i). The boxes cannot be rotated and can only be stacked on top of one another if each box in the stack is strictly larger than the box above it in width, height, and depth. Implement a method to build the tallest stack possible, where the height of a stack is the sum of the heights of each box.

My initial thought is to adapt a similar approach as the Circus Tower problem. However, sorting the boxes in width, depth and height separately and find their LCS is kind of 3-Dimension DP, which is much more complicated. Actually I don't if it will truly work.

While answer in the book is taking the similar idea of solving LIS problem, I don't understand it well, and I don't like the code either. So I just put my idea here:

1. Sort the boxes on increasing order of height.
2. Let:
H(i) = Maximum possible Stack Height with box i at bottom of stack
H(i) = { Max ( H(j) ) + height(i) }
           where j is in [1..i-1] and width(j) < width(i) and depth(j) < depth(i) and height(j) < height(i)
           If there is no such j then H(i) = height(i).
3. Scan H(1) to H(n). Find the max as the answer.

I may put the code later.

Tuesday, January 29, 2013

[leetcode] Generate Parentheses

Problem:
Given n pairs of parentheses, write a function to generate all combinations of well-formed parentheses.
For example:
n=1: "()"
n=2: "()()","(())"
n=3: "((()))","()(())","()()()","(()())","(())()"
...
While this problem seems familiar to me, I didn't have good solutions at first glance. Then my first step was to observe more examples and tried to find some patterns.

Let [n] indicates all well-formed combinations of number n. Then it was not difficult to find out that we had the following patters:
[n] = { "(" + [n-1] + ")"  , [1][n-1], [2][n-2], ...., [n-2][2], [n-1][1] }
where [m][n] indicates the cartesian product of two sets.
    e.g. [2][2] =  { "()()","(())" } x { "()()","(())" }
= { "()()()()","()()(())","(())()()","()()()()" }
As we can see, there will be duplicates of the product, so I use the Set container in java to avoid duplicates.

Then the main idea is to iterate from 1 to n. When calculate k, we can use the previous [1..k-1] results to generate the results of k.
public class GenerateParentheses {
  public ArrayList<String> generateParenthesis(int n) {
         
         ArrayList<Set<String>> list = new ArrayList<Set<String>>();
         
         int number=1;
         
         while (number<=n){
             Set<String> parentheses = new HashSet();
             //generate combinations of pattern "("+[number-1]+")"
             generateParentheses(number,list,parentheses);
             //generate combinations of pattern [1][number-1],[2][number-2],...[number-1][1]
             for(int i=1;i<number;i++){
                 int leftIndex=i-1;
                 int rightIndex=number-i-1;
                 for(String left:list.get(leftIndex)){
                     for(String right:list.get(rightIndex)){
                         parentheses.add(left+right);
                     }
                 }
             }
             //add results of this number and go on.
             list.add(parentheses);
             number++;
         }
         //add the elements to list required by OJ.
         ArrayList<String> result = new ArrayList<String>();
         Iterator<String> it = list.get(list.size()-1).iterator();
         while(it.hasNext())
            result.add(it.next());
         return result;
     }
     
     private void generateParentheses(int number,ArrayList<Set<String>> list,Set<String> parentheses){
         if(number==1){
             parentheses.add("()");
             return;
         }
         StringBuilder sb = new StringBuilder();
         for(String s:list.get(number-2)){
              sb.append("(");
              sb.append(s);
              sb.append(")");
              parentheses.add(sb.toString());
              sb.setLength(0);
         }
     }
}

I know...The code looks messy and kind of crazy with 4-nested loops.
After surfing online and discussed with others, I have a much better solution with the idea that is much easier to illustrate.

When the parameter is n, we are going to have n left parenthesis and n right parenthesis. We can add either "(" or ")" to current constructed string (initially empty string), but with some constrains:
1. No. of left parenthesis "(" much be less or equal to n.
2. Certainly, right parenthesis ")" need to be <= n, but also need to be <= no. of  "(".
Based this point, we can easily get a recursive approach :
public class GenerateParentheis {
 ArrayList<String> list;
    
    public ArrayList<String> generateParenthesis(int n) {
        list = new ArrayList<String>();
        generateParenthesis("",0,0,n);
        return list;
    }
    
    private void generateParenthesis(String s,int left,int right,int n){
        if(left==n && right==n){
            list.add(s);
            return;
        }
        if(left<n)
            generateParenthesis(s+"(",left+1,right,n);
        if(right<left)
            generateParenthesis(s+")",left,right+1,n);
    }
}

Even More
There're many problems that looks totally different at first glance, but are the same in nature.
The key in called Catalan Number. The n th Catalan number is 1/(n+1) * C(2n, n).
So when n=3, we can get Catalan number = 5, which is the total number of well-formed parenthesis combinations when n=3.