Thousand Sunny: 2013

Thursday, March 21, 2013

[leetcode] Palindrome Partition II

Problem is stated here .
It is easy to find out that it's a dynamic programming problem. Certainly we can use recursion to solve it, but we will see that there're a lot of repeated computations, and therefore recursion is inefficient in this kind of situation.
DP approach isn't difficult, a good and brief explanation can be found here.

Let dp[i] = min cut for substring [i...n], where n is the length of input string.
Then dp[i] = 0, if substring[i..n] itself is a palindrome
or dp[i] = 1+dp[j], if substring(i, j) is a palindrome ( i<j<n )

Here comes the interesting part, how do we know if substring(i, j) is a palindrome or not ?
If we check it directly by going though characters from i to j-1, it reveals a overall runtime of O(n^3) of this algorithm. The trick is using another DP:

substring(i, j) is palindrome if charAt(i) == charAt(j-1) && subtring(i+1,j-1) is palindrome.

So, this is a interesting mixed DP problem and can be solved in O(n^2).
Note that this post made a good point that it can be regarded as a variant of Matrix Chain Multiplication Problem. They gave a neat code for this problem using 2D dynamic programming algorithm which is very similar with the one to solve Matrix Chain Multiplication Problem. It makes perfect sense, but the time complexity is O(n^3).

As I talked it before, this problem can be solved in runtime of O(n^2). So what's the difference between it and Matrix Chain Multiplication Problem? Well, by using anoter DP, we can find out if substring(i, j) is palindrome in O(1) time while we can not directly get the min multiplications of sub matrices(i,j) if we applied the same strategy to it.

Another related problem is problem 17.14 in the book Cracking the Coding Interview:
Given a text without spaces, e.g."helloworldniceday",
Find the minimal words it can be divided into. e.g. "hello world nice day" -> 4 words.
It occurs to me that this is another variant of Matrix Chain Multiplication Problem with 2-D DP at first, but actually it's much easier than I thought. 1-D DP is enough, just like the Palindrome Partition problem I talked first.

//-1 means this text can't be divided to valid words.
public static int getMinSeperateWords(String text){
		int len = text.length();
		int[] dp = new int[len];
		for(int i=len-1;i>=0;i--){
			if(Dict.isValid(text.substring(i))){
				dp[i]=1;
				continue;
			}
			int min=Integer.MAX_VALUE;
			for(int j=i+1;j<len;j++){
				if(dp[j]!=-1 && Dict.isValid(text.substring(i,j))){
					if(dp[j]+1<min)
						min=dp[j]+1;
				}
			}
			dp[i]= min==Integer.MAX_VALUE ? -1:min;
		}
		return dp[0];
}

In conclusion, problems can be related while they can also be harder or easier than the similar ones. Practice makes perfect.

Wednesday, February 20, 2013

[leetcode] Implement strStr() | String matching algorithms

Problem:
Returns a pointer to the first occurrence of needle in haystack, or null if needle is not part of haystack.
e.g. haystack=>"mississippi", needle=>"issip". return "issippi"

This is not a difficult problem. A first thought of brute force algorithm is easy to come up with. However, it gets very tricky if we want to improve the efficiency.

Knuth-Morris-Pratt(KMP) Algorithm
See details at wiki and here

public class StrStr {

	public String strKMP(String haystack, String needle){
		if(needle.length()==0)
			return haystack;
		//get next array
		int[] next = generateNext(needle.toCharArray());
		int len = 0;     // Number of characters matched            
        
        for( int i=0; i<haystack.length(); i++ ){            
        	while( len>0 && needle.charAt(len)!= haystack.charAt(i) )            
        		len = next[len];            
            if( needle.charAt(len) == haystack.charAt(i) )            
                len++;            
            if( len == needle.length() )            
                return haystack.substring(i-needle.length()+1);
        }      
        
        return null;
	}
	/**
	 * KeyPoint. Get next array.
	 * @param needle
	 * @return
	 */
	private int[] generateNext(char[] needle){
		//next array. next[i] means when match length = i, the common prefix and real suffix length = next[i] 
		int[] next = new int[needle.length+1];
		//the longest common length of  prefix and real suffix.
		int presufix = 0;
		//next[1] = 0, starting from 2. i = match length.
		for(int i=2;i<next.length;i++){
			while(presufix>0 && needle[presufix]!=needle[i-1])//trickiest part
				presufix=next[presufix];
			if(needle[presufix]==needle[i-1])
				presufix++;
			next[i]=presufix;
		}
		return next;
	}

}

Rolling Hash Algorithm
See details at wiki. An important application of rolling hash is Rabin-Karp Algorithm. While Rabin-Karp algorithm is a good choice for multiple pattern search , I borrowed the idea to solve this problem. Although it is slower than KMP algorithm in worst case in terms of time complexity, it is much easier to understand than KMP.
The hash function H is:

Where a is constant and c1..ck are input chars.
Here's a simple example of rolling hash.
Input: char array A[1...10]. Length k=5.
H[1..5] = A[1]*a^4 + A[2]*a^3 + A[3]*a^2 + A[4]*a^1 + A[5]*a^0
Then how can we compute H[2..6] ?
Instead of computing it from scratch, we can do as following based on H[1..5]
1. Remove first item of H[1..5] : H[2..6] = H[1..5] - A[1]*a^4
2. Times constant a : H[2..6] *= a
3. Add last item A[6] : H[2..6] += A[6]

Based on this idea, I coded a very naive implementation to solve the problem. Note that to avoid manipulate huge H values, all maths are done by mod in reality.

public class StrStrRollingHash {
	public static final int A = 3;
	
	public String strStr(String haystack, String needle) {
		int n = haystack.length();
		int m = needle.length();
		if(m==0)
			return haystack;
        if(n<m)
            return null;
		
		char[] input = haystack.toCharArray();
		int needleHash = getHash(needle.toCharArray(),0,m-1);
		int haystackHash = getHash(input,0,m-1);
		
		for(int i=0;i<n-m;i++){
			if(haystackHash==needleHash){
				if(haystack.substring(i, i+m).equals(needle))
					return haystack.substring(i);
			}
			//rolling hash
			haystackHash-=(int)(input[i]*Math.pow(A, m-1));//remove i th item
			haystackHash*=A;//times A
			haystackHash+=input[i+m];//add i+m th item	
		}
		
		if(haystackHash==needleHash){
			if(haystack.substring(n-m, n).equals(needle))
				return haystack.substring(n-m);
		}
		
		return null;
        
    }
	
	private static int getHash(char[] array,int start,int end){
		int k=end-start;
		int sum=0;
		for(int i=start;i<=end;i++){
			int item = (int)(array[i]*Math.pow(A,k--));
			sum+=item;
		}
		return sum;
	}
}

You can try this problem on leetcode.
There's another related Boyer-Moore Algorithm, I will study it when get free.

Monday, February 18, 2013

交换使两个序列差最小| Two Equal-sized Subsets Partition Problem

题目：有两个序列a,b，大小都为n,序列元素的值任意整数，无序；
要求：通过交换a,b中的元素，使[序列a元素的和]与[序列b元素的和]之间的差最小。

这是微软的一道面试题，也在July整理的面试100题之中。本来我想了一下，没有什么特别好的方案，于是百度之。结果百度第一页出来的答案，都一样就算了，还是错的。吐槽无力，然后就有了这篇中文文章。

首先很容易想到，这问题的本质，就是将一个长度为2n的数组，分成长度为n的两个子数组，使它们和的差值最小。这是数组分隔问题的一个特例。见Partition Problem。在原问题中，原数组允许被分为两个任意长度的子数组（只要它们合起来为2n）。然而有了子数组长度必须为n这个条件后，这个问题难度并没有降低--它还是一个NP问题。证明见这里。

网上搜出来的很多解法，其实不能说全错。但它们不能保证始终得到最优解。大多数方法用的都是贪婪算法的思想，然而贪婪算法是以局部最优解去揣测全局最优解--可能成功，也可能失败。贪婪算法获取全局最优解的典型例子比如Dijkstra最短路径算法以及求最小生成树的kruskal算法。但其有效性需要严格的证明，当然，证明其失败的话，一个反例即可。网上的大多数算法，不能保证总是最优解，但有一些确实是优秀的渐近算法，能保证结果于最优解的差距在一定范围以内。关于这些算法可以参考这里。

那么，如何得到最优解？
1. 穷举法。
天下武功，唯快不破。算法么，只要够暴力，总是能解的。- -！
问题本质是从2n个元素中取出n个元素，使其和最接近于Sum/2，其中Sum为2n个元素的总和。
所以只要枚举出2n个元素中n个元素的所有组合，迭代更新最接近的值，总和得到最好的解。当然，这个算法的时间复杂度就是阶乘级别的了，不过对于一个NP问题，这也不丢脸呵。

2. DP 算法。
详见编程之美209页。

public class PartitionArray {
 
 /**
  * @param array:
  *        assuming that the input array is the combination of two n-sized array.
  * @return int :
  *        minimal difference of the two n-sized subsets.
  */
 public static int minimalSum (int[] array){
  int total = sum(array);
  int sum = total/2;
  int n=array.length/2;
  
  //dp[i][v] = true means that there's i elements added up to value v.
  boolean[][] dp = new boolean[n+1][sum+1];
  dp[0][0]=true;
  
  for(int k=1;k<=2*n;k++){
   int val = array[k-1];
   for(int i=1;i<=k&&i<=n;i++){
    for(int v=1;v<=sum;v++){
     //try to take k as i th element
     if(v>=val && dp[i-1][v-val])
      dp[i][v]=true;
    }
   }
  }
  //find the answer. we need n-sized subset, so just check dp[n][*].
  for(int i=sum;i>0;i--){
   if(dp[n][i])
    /**
     * we find a subset with size=n and sum=i,another subset will have sum=total-i
     * so the difference will be (total-i)-i=total-2*i
     */
    return total-2*i;
  }
  return -1; 
 }
 
 private static int sum(int[] array){
  int sum=0;
  for(int i:array)
   sum+=i;
  return sum;
 }
}

当数组之和非常大的时候，DP算法的数组就会非常大，就不那么高效了。
对于一些极端例子，比如A=[ 12345,12332,14098], B=[ 32213,12321,23132]. DP还不如暴力破解。C(6,3) = 120。总共才120种可能。

Tuesday, February 5, 2013

Talk is cheap. Show me the code.

--Linus Torvalds in Linux-kernel mail list

Monday, February 4, 2013

Notes on LCS, LIS, and related problems

LCS (Longest Common Subsequence) is a well-known dynamic programming(DP) problem. I learned it several times, but sadly I forgot it several times as well. These kinds of problems are not that straightforward for someone first encounters it. While I'm learning(to be more specific, re-learning) these related problems recently, I still haven't found any short path to master DP algorithms. Personally, I think completely understanding more and more problems will enlarge our knowledge to solve other similar problems in the future. And of course, coding is inevitable and essential.

Circus Tower Sorting Problem
There are many interesting problems related to LCS problem. For example:
A circus is designing a tower routine consisting of people standing atop one another’s shoulders. For practical and aesthetic reasons, each person must be both shorter and lighter than the person below him or her. Given the heights and weights of each person in the circus, write a method to compute the largest possible number of people in such a tower.
EXAMPLE:
Input (ht, wt): (65, 100) (70, 150) (56, 90) (75, 190) (60, 95) (68, 110)
Output: The longest tower is length 6 and includes from top to bottom: (56, 90) (60,95) (65,100) (68,110) (70,150) (75,190)

This problem can be easily converted to LCS problem and therefore solution is quite simple:

Sort the data according to height. Denote it as D1. That is O(nlogn).
Sort the data according to weight. Denote it as D2. That is O(nlogn).
Find the length of the LCS of D1 and D2. Using dynamic programming, it’s O(n^2).

LIS (Longest Increasing Subsequence) Problem

Given a good understanding of LCS problem, this problem can be solved quickly, though it's tricky.

Sort the Sequence in increasing order. Denote it as S'. That is O(nlogn).
Find the length of the LCS of original sequence S and S'. Using dynamic programming, it’s O(n^2).

This reveals the overall O(n^2) time complexity solution. Another O(n^2) DP solution is shown here. There's exists O(nlogn) solution to solve it and that's not hard either. They can be found on wiki and a good example is shown on a Chinese blog.

Box Stacking Problem

The boxes in this particular problem can be rotated and multiple instance of same box is allowed. This problem can be solved based on the above LIS problem.

There's another box stacking problem which is slightly different from the above one in Cracking the Coding Interview 5th Edition Chapter 9.10 :

You have a stack of n boxes, with width w(i), depth d(i), and height h(i). The boxes cannot be rotated and can only be stacked on top of one another if each box in the stack is strictly larger than the box above it in width, height, and depth. Implement a method to build the tallest stack possible, where the height of a stack is the sum of the heights of each box.

My initial thought is to adapt a similar approach as the Circus Tower problem. However, sorting the boxes in width, depth and height separately and find their LCS is kind of 3-Dimension DP, which is much more complicated. Actually I don't if it will truly work.

While answer in the book is taking the similar idea of solving LIS problem, I don't understand it well, and I don't like the code either. So I just put my idea here:

1. Sort the boxes on increasing order of height.
2. Let:
H(i) = Maximum possible Stack Height with box i at bottom of stack
H(i) = { Max ( H(j) ) + height(i) }
where j is in [1..i-1] and width(j) < width(i) and depth(j) < depth(i) and height(j) < height(i)
If there is no such j then H(i) = height(i).
3. Scan H(1) to H(n). Find the max as the answer.

I may put the code later.

Tuesday, January 29, 2013

[leetcode] Generate Parentheses

Problem:
Given n pairs of parentheses, write a function to generate all combinations of well-formed parentheses.
For example:
n=1: "()"
n=2: "()()","(())"
n=3: "((()))","()(())","()()()","(()())","(())()"
...
While this problem seems familiar to me, I didn't have good solutions at first glance. Then my first step was to observe more examples and tried to find some patterns.

Let [n] indicates all well-formed combinations of number n. Then it was not difficult to find out that we had the following patters:
[n] = { "(" + [n-1] + ")" , [1][n-1], [2][n-2], ...., [n-2][2], [n-1][1] }
where [m][n] indicates the cartesian product of two sets.
e.g. [2][2] = { "()()","(())" } x { "()()","(())" }
= { "()()()()","()()(())","(())()()","()()()()" }
As we can see, there will be duplicates of the product, so I use the Set container in java to avoid duplicates.

Then the main idea is to iterate from 1 to n. When calculate k, we can use the previous [1..k-1] results to generate the results of k.

public class GenerateParentheses {
  public ArrayList<String> generateParenthesis(int n) {
         
         ArrayList<Set<String>> list = new ArrayList<Set<String>>();
         
         int number=1;
         
         while (number<=n){
             Set<String> parentheses = new HashSet();
             //generate combinations of pattern "("+[number-1]+")"
             generateParentheses(number,list,parentheses);
             //generate combinations of pattern [1][number-1],[2][number-2],...[number-1][1]
             for(int i=1;i<number;i++){
                 int leftIndex=i-1;
                 int rightIndex=number-i-1;
                 for(String left:list.get(leftIndex)){
                     for(String right:list.get(rightIndex)){
                         parentheses.add(left+right);
                     }
                 }
             }
             //add results of this number and go on.
             list.add(parentheses);
             number++;
         }
         //add the elements to list required by OJ.
         ArrayList<String> result = new ArrayList<String>();
         Iterator<String> it = list.get(list.size()-1).iterator();
         while(it.hasNext())
            result.add(it.next());
         return result;
     }
     
     private void generateParentheses(int number,ArrayList<Set<String>> list,Set<String> parentheses){
         if(number==1){
             parentheses.add("()");
             return;
         }
         StringBuilder sb = new StringBuilder();
         for(String s:list.get(number-2)){
              sb.append("(");
              sb.append(s);
              sb.append(")");
              parentheses.add(sb.toString());
              sb.setLength(0);
         }
     }
}

I know...The code looks messy and kind of crazy with 4-nested loops.
After surfing online and discussed with others, I have a much better solution with the idea that is much easier to illustrate.

When the parameter is n, we are going to have n left parenthesis and n right parenthesis. We can add either "(" or ")" to current constructed string (initially empty string), but with some constrains:
1. No. of left parenthesis "(" much be less or equal to n.
2. Certainly, right parenthesis ")" need to be <= n, but also need to be <= no. of "(".
Based this point, we can easily get a recursive approach :

public class GenerateParentheis {
 ArrayList<String> list;
    
    public ArrayList<String> generateParenthesis(int n) {
        list = new ArrayList<String>();
        generateParenthesis("",0,0,n);
        return list;
    }
    
    private void generateParenthesis(String s,int left,int right,int n){
        if(left==n && right==n){
            list.add(s);
            return;
        }
        if(left<n)
            generateParenthesis(s+"(",left+1,right,n);
        if(right<left)
            generateParenthesis(s+")",left,right+1,n);
    }
}

Even More
There're many problems that looks totally different at first glance, but are the same in nature.
The key in called Catalan Number. The n th Catalan number is 1/(n+1) * C(2n, n).
So when n=3, we can get Catalan number = 5, which is the total number of well-formed parenthesis combinations when n=3.

Saturday, January 26, 2013

Some Powerful Shell Commands

Original article at coolshell

!$ !$是一个特殊的环境变量，它代表了上一个命令的最后一个字符串。如：你可能会这样： $mkdir mydir $mv mydir yourdir $cd yourdir 可以改成： $mkdir mydir $mv !$ yourdir $cd !$

sudo !!
以root的身份执行上一条命令。
场景举例：比如Ubuntu里用apt-get安装软件包的时候是需要root身份的，我们经常会忘记在apt-get前加sudo。每次不得不加上sudo再重新键入这行命令，这时可以很方便的用sudo !!完事。
（注：在shell下，有时候你会输入很长的命令，你可以使用!xxx来重复最近的一次命令，比如，你以前输入过，vi /where/the/file/is, 下次你可以使用 !vi 重得上次最近一次的vi命令。）

cd –
回到上一次的目录。
场景举例：当前目录为/home/a，用cd ../b切换到/home/b。这时可以通过反复执行cd –命令在/home/a和/home/b之间来回方便的切换。

‘ALT+.’ or ‘<ESC> .’
热建alt+. 或 esc+. 可以把上次命令行的参数给重复出来。

^old^new
替换前一条命令里的部分字符串。
场景：echo "wanderful"，其实是想输出echo "wonderful"。只需要^a^o就行了，对很长的命令的错误拼写有很大的帮助。（也可以使用 !!:gs/old/new）

du -s * | sort -n | tail
列出当前目录里最大的10个文件。

:w !sudo tee %
在vi中保存一个只有root可以写的文件

date -d@1234567890
时间截转时间

> file.txt
创建一个空文件，比touch短。

mtr coolshell.cn
mtr命令比traceroute要好。

在命令行前加空格，该命令不会进入history里。

echo “ls -l” | at midnight
在某个时间运行某个命令。

curl -u user:pass -d status=”Tweeting from the shell” http://twitter.com/statuses/update.xml
命令行的方式更新twitter。

curl -u username –silent “https://mail.google.com/mail/feed/atom” | perl -ne ‘print “\t” if /<name>/; print “$2\n” if /<(title|name)>(.*)<\/\1>/;’
检查你的gmail未读邮件

ps aux | sort -nk +4 | tail
列出头十个最耗内存的进程

man ascii
显示ascii码表。
场景：忘记ascii码表的时候还需要google么?尤其在天朝网络如此“顺畅”的情况下，就更麻烦在GWF多应用一次规则了，直接用本地的man ascii吧。

ctrl-x e
快速启动你的默认编辑器（由变量$EDITOR设置）。

netstat –tlnp
列出本机进程监听的端口号。（netstat -anop 可以显示侦听在这个端口号的进程）

tail -f /path/to/file.log | sed '/^Finished: SUCCESS$/ q'
当file.log里出现Finished: SUCCESS时候就退出tail，这个命令用于实时监控并过滤log是否出现了某条记录。

ssh user@server bash < /path/to/local/script.sh
在远程机器上运行一段脚本。这条命令最大的好处就是不用把脚本拷到远程机器上。

ssh user@host cat /path/to/remotefile | diff /path/to/localfile -
比较一个远程文件和一个本地文件

net rpc shutdown -I ipAddressOfWindowsPC -U username%password
远程关闭一台Windows的机器

screen -d -m -S some_name ping my_router
后台运行一段不终止的程序，并可以随时查看它的状态。-d -m参数启动“分离”模式，-S指定了一个session的标识。可以通过-R命令来重新“挂载”一个标识的session。更多细节请参考screen用法 man screen。

wget --random-wait -r -p -e robots=off -U mozilla http://www.example.com
下载整个www.example.com网站。（注：别太过分，大部分网站都有防爬功能了：））

curl ifconfig.me
当你的机器在内网的时候，可以通过这个命令查看外网的IP。

convert input.png -gravity NorthWest -background transparent -extent 720×200 output.png
改一下图片的大小尺寸

lsof –i
实时查看本机网络服务的活动状态。

vim scp://username@host//path/to/somefile
vim一个远程文件

python -m SimpleHTTPServer
一句话实现一个HTTP服务，把当前目录设为HTTP服务目录，可以通过http://localhost:8000访问这也许是这个星球上最简单的HTTP服务器的实现了。

history | awk '{CMD[$2]++;count++;} END { for (a in CMD )print CMD[a] " " CMD[a]/count*100 "% " a }' | grep -v "./" | column -c3 -s " " -t | sort -nr | nl | head -n10
(有点复杂了，history|awk ‘{print $2}’|awk ‘BEGIN {FS=”|”} {print $1}’|sort|uniq -c|sort -rn|head -10)
这行脚本能输出你最常用的十条命令，由此甚至可以洞察你是一个什么类型的程序员。

tr -c “[:digit:]” ” ” < /dev/urandom | dd cbs=$COLUMNS conv=unblock | GREP_COLOR=”1;32″ grep –color “[^ ]“
想看看Marix的屏幕效果吗？（不是很像，但也很Cool!）

“Where there is a shell，there is a way!”

Friday, January 25, 2013

From 韩寒

你知道你能做到，别人觉得你也许可以做到，那么，少废话，做到再说，其他的怨气都是虚妄。自己没有展露光芒，就不应该怪别人没有眼光。
-- 韩寒《春萍，我做到了》

Wednesday, January 23, 2013

Majority Vote Algorithm with Linear Time & Constant Space

I saw a problem on some certain forum days ago:
Design an algorithm that, given a list of N elements in an array, finds all the elements that appear more that n/3 times in linear time and using constant space only.

Well, another constant space problem. Without this constrain, we can easily achieve an linear time algorithm by using a hashmap or just another array.
I didn't have good solutions. And then I searched online, and found that there was a particular name for this problem: Majority Vote. And there's a neat algorithm here .

While this algorithm seemed not that straightforward to me at first, the idea behind it was quite simple:
The majority ( if there exists ) will remain the same when we delete 3 different elements from list each time, taking the problem I stated at first as the example.

Untitled

After all these years, there's only you still being on my side.
Thank you, Amanda.