Thousand Sunny

Friday, April 16, 2021

Make the token count

I have been thinking of the purpose of life since my late 20's. Not sure I got the perfect answer, but I'm reaching a point that I feel comfortable using it to guide my daily life and share with others.

Instead of purpose, there was a mental shift for me that told me maybe the purpose is not the goal. There is no endgame. From a grand scheme of the universe, time is meaningless. From the recent rapid development of Artificial intelligence, we may all just be in a big simulation. Nothing is real. So then what is the purpose?

When I was little, I used to play those slot machines with my friends. We saved a little bit of money from not buying the snack, and bought the tokens to play instead. If life is similar to a game, then what's important is really the experience and the journey itself.

So here are my goals for this journey.

1. Build and experience good and deep relationships.

That's really a big part of why we are born in this world. The relationships with parents, friends, coworkers, lovers, and even a pet, give us comfort and courage when we need them. In Ray Dalio's book Principles, he mentioned what he pursues in life were good work and good relationships. I thought that was too little to ask while there are so many other nice things in life. Now I understand it a little bit better, it can be just this simple.

2. Maintain the independence of time and space for me to think and live.

This is very important to me. I get desperate and anxious if I feel confined, both in life and work. I know I just talked about family and relationships, but that doesn't mean one has to go with the crowd. For a very long time in my life, saying No to others was always hard for me. I worry about disappointing and being disliked by others. But one has to be an independent individual to make himself and others around healthy. For example, in Adler's theory of Individual Psychology, a lot of troubles come from the relationship with others, especially when they are unbalanced.

3. Contribute something to this world, even just a tiny bit.

Unless one is a Nobel Prize winner or similar, no one would ever remember a person 100 years after he/she dies (who-will-remember-you-in-100-years). Once my life goal was to create some brilliant and perfect programs such that they can still be running on thousands of servers even after I die. That was another form of immortality. Later on, I realized even that cannot last for, say, more than 1000 years. Most of us mean absolutely nothing if we look at things from a slightly bigger picture of the entire human history. But keep in mind life is a journey, and to me, it's always important to do something good. My code might be overwritten by other more talented software engineers in the future, and this is ok. I have done my part of the contribution.

Nietzsche wrote in one of his books "He who has a 'why' to live can bear almost any 'how' ". If life is a big simulation, then we all paid our token for this round. Make the token count, go out and enjoy.

Tuesday, March 23, 2021

The vision for software products (well, for anything really) is ok to be vague, ok to be far from reality, but it needs to be actually EXCITING - something similar to "we are building a flying car" or "I have a dream" (MLK)

If none of the customer or developer is excited about the vision, then it's just BS.

Thursday, March 21, 2013

[leetcode] Palindrome Partition II

Problem is stated here .
It is easy to find out that it's a dynamic programming problem. Certainly we can use recursion to solve it, but we will see that there're a lot of repeated computations, and therefore recursion is inefficient in this kind of situation.
DP approach isn't difficult, a good and brief explanation can be found here.

Let dp[i] = min cut for substring [i...n], where n is the length of input string.
Then dp[i] = 0, if substring[i..n] itself is a palindrome
or dp[i] = 1+dp[j], if substring(i, j) is a palindrome ( i<j<n )

Here comes the interesting part, how do we know if substring(i, j) is a palindrome or not ?
If we check it directly by going though characters from i to j-1, it reveals a overall runtime of O(n^3) of this algorithm. The trick is using another DP:

substring(i, j) is palindrome if charAt(i) == charAt(j-1) && subtring(i+1,j-1) is palindrome.

So, this is a interesting mixed DP problem and can be solved in O(n^2).
Note that this post made a good point that it can be regarded as a variant of Matrix Chain Multiplication Problem. They gave a neat code for this problem using 2D dynamic programming algorithm which is very similar with the one to solve Matrix Chain Multiplication Problem. It makes perfect sense, but the time complexity is O(n^3).

As I talked it before, this problem can be solved in runtime of O(n^2). So what's the difference between it and Matrix Chain Multiplication Problem? Well, by using anoter DP, we can find out if substring(i, j) is palindrome in O(1) time while we can not directly get the min multiplications of sub matrices(i,j) if we applied the same strategy to it.

Another related problem is problem 17.14 in the book Cracking the Coding Interview:
Given a text without spaces, e.g."helloworldniceday",
Find the minimal words it can be divided into. e.g. "hello world nice day" -> 4 words.
It occurs to me that this is another variant of Matrix Chain Multiplication Problem with 2-D DP at first, but actually it's much easier than I thought. 1-D DP is enough, just like the Palindrome Partition problem I talked first.

//-1 means this text can't be divided to valid words.
public static int getMinSeperateWords(String text){
		int len = text.length();
		int[] dp = new int[len];
		for(int i=len-1;i>=0;i--){
			if(Dict.isValid(text.substring(i))){
				dp[i]=1;
				continue;
			}
			int min=Integer.MAX_VALUE;
			for(int j=i+1;j<len;j++){
				if(dp[j]!=-1 && Dict.isValid(text.substring(i,j))){
					if(dp[j]+1<min)
						min=dp[j]+1;
				}
			}
			dp[i]= min==Integer.MAX_VALUE ? -1:min;
		}
		return dp[0];
}

In conclusion, problems can be related while they can also be harder or easier than the similar ones. Practice makes perfect.

Wednesday, February 20, 2013

[leetcode] Implement strStr() | String matching algorithms

Problem:
Returns a pointer to the first occurrence of needle in haystack, or null if needle is not part of haystack.
e.g. haystack=>"mississippi", needle=>"issip". return "issippi"

This is not a difficult problem. A first thought of brute force algorithm is easy to come up with. However, it gets very tricky if we want to improve the efficiency.

Knuth-Morris-Pratt(KMP) Algorithm
See details at wiki and here

public class StrStr {

	public String strKMP(String haystack, String needle){
		if(needle.length()==0)
			return haystack;
		//get next array
		int[] next = generateNext(needle.toCharArray());
		int len = 0;     // Number of characters matched            
        
        for( int i=0; i<haystack.length(); i++ ){            
        	while( len>0 && needle.charAt(len)!= haystack.charAt(i) )            
        		len = next[len];            
            if( needle.charAt(len) == haystack.charAt(i) )            
                len++;            
            if( len == needle.length() )            
                return haystack.substring(i-needle.length()+1);
        }      
        
        return null;
	}
	/**
	 * KeyPoint. Get next array.
	 * @param needle
	 * @return
	 */
	private int[] generateNext(char[] needle){
		//next array. next[i] means when match length = i, the common prefix and real suffix length = next[i] 
		int[] next = new int[needle.length+1];
		//the longest common length of  prefix and real suffix.
		int presufix = 0;
		//next[1] = 0, starting from 2. i = match length.
		for(int i=2;i<next.length;i++){
			while(presufix>0 && needle[presufix]!=needle[i-1])//trickiest part
				presufix=next[presufix];
			if(needle[presufix]==needle[i-1])
				presufix++;
			next[i]=presufix;
		}
		return next;
	}

}

Rolling Hash Algorithm
See details at wiki. An important application of rolling hash is Rabin-Karp Algorithm. While Rabin-Karp algorithm is a good choice for multiple pattern search , I borrowed the idea to solve this problem. Although it is slower than KMP algorithm in worst case in terms of time complexity, it is much easier to understand than KMP.
The hash function H is:

Where a is constant and c1..ck are input chars.
Here's a simple example of rolling hash.
Input: char array A[1...10]. Length k=5.
H[1..5] = A[1]*a^4 + A[2]*a^3 + A[3]*a^2 + A[4]*a^1 + A[5]*a^0
Then how can we compute H[2..6] ?
Instead of computing it from scratch, we can do as following based on H[1..5]
1. Remove first item of H[1..5] : H[2..6] = H[1..5] - A[1]*a^4
2. Times constant a : H[2..6] *= a
3. Add last item A[6] : H[2..6] += A[6]

Based on this idea, I coded a very naive implementation to solve the problem. Note that to avoid manipulate huge H values, all maths are done by mod in reality.

public class StrStrRollingHash {
	public static final int A = 3;
	
	public String strStr(String haystack, String needle) {
		int n = haystack.length();
		int m = needle.length();
		if(m==0)
			return haystack;
        if(n<m)
            return null;
		
		char[] input = haystack.toCharArray();
		int needleHash = getHash(needle.toCharArray(),0,m-1);
		int haystackHash = getHash(input,0,m-1);
		
		for(int i=0;i<n-m;i++){
			if(haystackHash==needleHash){
				if(haystack.substring(i, i+m).equals(needle))
					return haystack.substring(i);
			}
			//rolling hash
			haystackHash-=(int)(input[i]*Math.pow(A, m-1));//remove i th item
			haystackHash*=A;//times A
			haystackHash+=input[i+m];//add i+m th item	
		}
		
		if(haystackHash==needleHash){
			if(haystack.substring(n-m, n).equals(needle))
				return haystack.substring(n-m);
		}
		
		return null;
        
    }
	
	private static int getHash(char[] array,int start,int end){
		int k=end-start;
		int sum=0;
		for(int i=start;i<=end;i++){
			int item = (int)(array[i]*Math.pow(A,k--));
			sum+=item;
		}
		return sum;
	}
}

You can try this problem on leetcode.
There's another related Boyer-Moore Algorithm, I will study it when get free.

Monday, February 18, 2013

交换使两个序列差最小| Two Equal-sized Subsets Partition Problem

题目：有两个序列a,b，大小都为n,序列元素的值任意整数，无序；
要求：通过交换a,b中的元素，使[序列a元素的和]与[序列b元素的和]之间的差最小。

这是微软的一道面试题，也在July整理的面试100题之中。本来我想了一下，没有什么特别好的方案，于是百度之。结果百度第一页出来的答案，都一样就算了，还是错的。吐槽无力，然后就有了这篇中文文章。

首先很容易想到，这问题的本质，就是将一个长度为2n的数组，分成长度为n的两个子数组，使它们和的差值最小。这是数组分隔问题的一个特例。见Partition Problem。在原问题中，原数组允许被分为两个任意长度的子数组（只要它们合起来为2n）。然而有了子数组长度必须为n这个条件后，这个问题难度并没有降低--它还是一个NP问题。证明见这里。

网上搜出来的很多解法，其实不能说全错。但它们不能保证始终得到最优解。大多数方法用的都是贪婪算法的思想，然而贪婪算法是以局部最优解去揣测全局最优解--可能成功，也可能失败。贪婪算法获取全局最优解的典型例子比如Dijkstra最短路径算法以及求最小生成树的kruskal算法。但其有效性需要严格的证明，当然，证明其失败的话，一个反例即可。网上的大多数算法，不能保证总是最优解，但有一些确实是优秀的渐近算法，能保证结果于最优解的差距在一定范围以内。关于这些算法可以参考这里。

那么，如何得到最优解？
1. 穷举法。
天下武功，唯快不破。算法么，只要够暴力，总是能解的。- -！
问题本质是从2n个元素中取出n个元素，使其和最接近于Sum/2，其中Sum为2n个元素的总和。
所以只要枚举出2n个元素中n个元素的所有组合，迭代更新最接近的值，总和得到最好的解。当然，这个算法的时间复杂度就是阶乘级别的了，不过对于一个NP问题，这也不丢脸呵。

2. DP 算法。
详见编程之美209页。

public class PartitionArray {
 
 /**
  * @param array:
  *        assuming that the input array is the combination of two n-sized array.
  * @return int :
  *        minimal difference of the two n-sized subsets.
  */
 public static int minimalSum (int[] array){
  int total = sum(array);
  int sum = total/2;
  int n=array.length/2;
  
  //dp[i][v] = true means that there's i elements added up to value v.
  boolean[][] dp = new boolean[n+1][sum+1];
  dp[0][0]=true;
  
  for(int k=1;k<=2*n;k++){
   int val = array[k-1];
   for(int i=1;i<=k&&i<=n;i++){
    for(int v=1;v<=sum;v++){
     //try to take k as i th element
     if(v>=val && dp[i-1][v-val])
      dp[i][v]=true;
    }
   }
  }
  //find the answer. we need n-sized subset, so just check dp[n][*].
  for(int i=sum;i>0;i--){
   if(dp[n][i])
    /**
     * we find a subset with size=n and sum=i,another subset will have sum=total-i
     * so the difference will be (total-i)-i=total-2*i
     */
    return total-2*i;
  }
  return -1; 
 }
 
 private static int sum(int[] array){
  int sum=0;
  for(int i:array)
   sum+=i;
  return sum;
 }
}

当数组之和非常大的时候，DP算法的数组就会非常大，就不那么高效了。
对于一些极端例子，比如A=[ 12345,12332,14098], B=[ 32213,12321,23132]. DP还不如暴力破解。C(6,3) = 120。总共才120种可能。

Tuesday, February 5, 2013

Talk is cheap. Show me the code.

--Linus Torvalds in Linux-kernel mail list

Monday, February 4, 2013

Notes on LCS, LIS, and related problems

LCS (Longest Common Subsequence) is a well-known dynamic programming(DP) problem. I learned it several times, but sadly I forgot it several times as well. These kinds of problems are not that straightforward for someone first encounters it. While I'm learning(to be more specific, re-learning) these related problems recently, I still haven't found any short path to master DP algorithms. Personally, I think completely understanding more and more problems will enlarge our knowledge to solve other similar problems in the future. And of course, coding is inevitable and essential.

Circus Tower Sorting Problem
There are many interesting problems related to LCS problem. For example:
A circus is designing a tower routine consisting of people standing atop one another’s shoulders. For practical and aesthetic reasons, each person must be both shorter and lighter than the person below him or her. Given the heights and weights of each person in the circus, write a method to compute the largest possible number of people in such a tower.
EXAMPLE:
Input (ht, wt): (65, 100) (70, 150) (56, 90) (75, 190) (60, 95) (68, 110)
Output: The longest tower is length 6 and includes from top to bottom: (56, 90) (60,95) (65,100) (68,110) (70,150) (75,190)

This problem can be easily converted to LCS problem and therefore solution is quite simple:

Sort the data according to height. Denote it as D1. That is O(nlogn).
Sort the data according to weight. Denote it as D2. That is O(nlogn).
Find the length of the LCS of D1 and D2. Using dynamic programming, it’s O(n^2).

LIS (Longest Increasing Subsequence) Problem

Given a good understanding of LCS problem, this problem can be solved quickly, though it's tricky.

Sort the Sequence in increasing order. Denote it as S'. That is O(nlogn).
Find the length of the LCS of original sequence S and S'. Using dynamic programming, it’s O(n^2).

This reveals the overall O(n^2) time complexity solution. Another O(n^2) DP solution is shown here. There's exists O(nlogn) solution to solve it and that's not hard either. They can be found on wiki and a good example is shown on a Chinese blog.

Box Stacking Problem

The boxes in this particular problem can be rotated and multiple instance of same box is allowed. This problem can be solved based on the above LIS problem.

There's another box stacking problem which is slightly different from the above one in Cracking the Coding Interview 5th Edition Chapter 9.10 :

You have a stack of n boxes, with width w(i), depth d(i), and height h(i). The boxes cannot be rotated and can only be stacked on top of one another if each box in the stack is strictly larger than the box above it in width, height, and depth. Implement a method to build the tallest stack possible, where the height of a stack is the sum of the heights of each box.

My initial thought is to adapt a similar approach as the Circus Tower problem. However, sorting the boxes in width, depth and height separately and find their LCS is kind of 3-Dimension DP, which is much more complicated. Actually I don't if it will truly work.

While answer in the book is taking the similar idea of solving LIS problem, I don't understand it well, and I don't like the code either. So I just put my idea here:

1. Sort the boxes on increasing order of height.
2. Let:
H(i) = Maximum possible Stack Height with box i at bottom of stack
H(i) = { Max ( H(j) ) + height(i) }
where j is in [1..i-1] and width(j) < width(i) and depth(j) < depth(i) and height(j) < height(i)
If there is no such j then H(i) = height(i).
3. Scan H(1) to H(n). Find the max as the answer.

I may put the code later.