查看原文
其他

经典习题 | Maximum Subarray Sum Problem

2017-02-03 老王 来Offer网



综述


Recursion is one of the most powerful algorithm design technique that it allows you to solve a complex problem by programs that are extremely concise and easy to reason their correctness. Basically, you need to reduce the original problem into one or more simpler instance of the same problem. The subdivision will continue until the instance we get can be solved directly.


The maximum subarray sum problem is one of the classical algorithm problems and it is stated as follow:

Given an array that contains n integers, try to find the subarray that has the largest sum and return this sum.


Though people may be already very familiar with the optimal solution, we are trying to solve this problem in a hard way from a pure recursion perspective, only by which we are able to perceive the beauty and elegancy, and to obtain a deeper appreciation of recursion.


1. The “brute force” recursive definition

If we think about the location of the subarray with the maximum sum, there are only two possible cases:

1) Either it is the current array itself.

2) Or it is completely contained within one of the two parts after the partition of the current array. Since we do not know the exact part, the simplest way to make sure we have the desired answer is to try every possible partition and the resulting parts and pick the one with the largest answer.


Based on this analysis, let’s recursively define the maximum subarray sum of the input array A[1..n], which we denote by MaxSubarraySum(A[1..n]). We will have



The python code is as follow:



What will be the time complexity then? If Tn is the total number of compares and additions to solve the maximum subarray sum problem with n integers, we have T1 = 0 and


To get the total number of compares and additions, we add the number of compares(which is n - 1 inner max operators and n - 1 after applying the outer max operator) and the number of additions (which is n - 1) for the reduction stage after the recursion to the number of compares and additions used for the subarrays after partitioning.When the partitioning element is the kth element, the subarrays after partitioning are of size k and n - k.


You can see that now our analysis reduces to the above equation that does not depend on properties of the algorithm. First, let us change the second part of the sum of the right side because of the symmetry to get:



Then subtract the same formula from the formula for n + 1 to eliminate the sum and rearrange terms to get:



This can be solved by adding both sides by 3 / 2:


Iterating, we have our final result since T1 = 0:


Thus,

As the above analysis shows, it is exponential.


2. Can we improve on top of this? Definitely.

The obvious reason why it is so slow is because it recomputes the result for the same subarray from scratch over and over again. In order to speed up our recursive algorithm, we can mark down our recursive results and look them up again if we need them later. This is called memoization.


For python implementation, what we need to do is add a cache to store the results of our recursive call and only perform the actual calculation on a certain instance of the problem if we do not encounter it before. Using decorator syntax in Python allows us to alter the functionality of a function without actually changing the source code of that function. Also, the code is more readable:



Doing memoization clearly helps to reduce the time complexity, but by how much? Pay attention to the way we fill our cache: each entry (i, j) is filled only after all of its predecessors (i, k) and (k, j) when i + 1 <= k < j are filled. If we ignore the time we spend in recursive calls, it requires linear time to evaluate the recurrence for each problem instance - subarray that starts at i and ends before j of the original input array. Since the recurrence for cache(i, j) is evaluated once for each pair, we can conclude that the new implementation will perform O(n^3) operations, an exponential time improvement over the naive recursive algorithm.


A further improvement is once we see how the cache is filled, we can replace the recursion with a specific computation order to fill this cache efficiently. I will omit the code and let you guys to figure it out:-)



3. Can we still improve on top of this? Definitely.

Do we really need to try out every possible partition? If we can reduce this, then we can improve our algorithm significantly.


Let us think again about the semantics of MaxSubarraySum(A[1..n]): we will find the subarray that has the largest sum among all those subarrays of which the length is less than or equal to n. If we focus on the length, then there are only two possibilities that a certain subarray is the desired answer:

1. Either the subarray of which the length is n, which means the original array iteself, has the largest sum

2. Or a subarray of which the length is less than n is the one. But this case will be covered by the union of all the subarrays of A[1..n-1] and all the subarrays of A[2..n] because this set covers all the subarrays of which the length is less than or equal to n - 1.


Based on the analysis above, we will have

When we implement this algorithm, in order to have a good time complexity, we can use the same memoization technique to avoid recomputation for the same instance and precompute the sum of all the possible subarrays (In the following code, I intentionally use the old way to calculate the sum in order to keep the codes succinct.):



By doing this, since we have O(n^2) different instances of the problem and each instance is required O(1) time to be evaluated because we do not have to spend O(n) time to do the summation and number of other instances it depends on is only 2, also, it is evaluated once as well, so the total time complexity is O(n^2). 


4. Can we do even better than this? Definitely.

Let us look at the first recursion method: We will try solve the maximum subarray sum problem for every subarrays after every possible partitioning and the final answer will be picked between the largest one among them and the sum of the input array itself. The reason we need to enumerate all the possible partitions is because this will guarantee that the subarray with maximum sum will be contained by one of the partitioning result if the input array itself is not the answer. But, the question is, do we really need to do this enumeration?


For a given partitioning, let’s say the subarray with maximum sum is not contained by any of the two subarrays, then there is only one remaining possibility:


As the above graph shows, the red subarray is the subarray with largest sum and is not completely contained with the left subarray or right subarray after the partitioning (denoted by the black vertical line). It actually contains two parts: a suffix subarray of the left part which has the largest sum and a prefix subarray of the right part which has the largest sum.


That means we do not need to waste our computation power for computing results of all the possible partitioning. We just have to look at one particular partitioning results and to handle the third case specially. Let’s say we the subarrays we have after the partitioning are almost the same in size. Then we will have the following new recursive rule as follow:



For the MaxSuffixSum(A[1..t]), the easiest way to do it is to enumerate all the possible suffix subarrays of array A[1..t] and pick the maximum one. The process for computing MaxPrefixSum(A[t + 1..n]) is also basically the same. The following is the python implementation for the aforementioned rule(In this implementation, I use generator to avoid list slicing):



For the time complexity, since we will first recursively solve the two subproblems that are both half in size and then use O(n) time for the remaining part, we can easily see that the final answer is O(nlogn).


Looks good? But we haven’t fully utilize the power of a recursion. Notice that we have to spend O(n) time to do the calculation for the third part, is there any way to reduce this? What happen if we delegate the computation to the recursion?


Let’s define MaxMetrics(A[1..n]) to be a tuple of (MaxSubarraySum(A[1..n]), MaxSuffixSum(A[1..n]), MaxPrefixSum(A[1..n])). Basically, it means solving MaxMetrics(A[1..n]) will give us those three results we care about. Now, we need to answer what the recursion rule for MaxMetrics(A[1..n]) will be.


Since we have a working recursion before, let’s try to improve on top of that. For MaxSubarraySum(A[1..n]), we do not have to do anything special. Everything should remain the same. But, for MaxSuffixSum(A[1..n]) and MaxPrefixSum(A[1..n]), how can we find the answers for them based on the results of solving two subproblems that are both half in size?


Let’s look at MaxPrefixSum(A[1..n]) first because these two are symmetrical. Now, if we think about the relationship between a specific prefix subarray of A[1..n] and the left half subarray of A[1..n], we will only have two possibilities:

1. Either the prefix subarray with the largest sum is completely contained by the left half subarray

2. Or the prefix subarray with the largest sum contains the left half subarray completely. Essentially, it consists of the whole left half subarray and the prefix subarray with the largest sum of the right half subarray


Clearly, that means:




And, the MaxSuffixSum(A[1..n]) is as follow:



Notice that if we still do the summation in a native way, the total time complexity for calculating the MaxPrefixSum(A[1..n]) will still be O(n). So what if we also delegate summation to our recursion? Since it is pretty easy to calculate the Sum(A[1..n]) from the results of those two subproblems because



So, let’s refine our recursive definition of MaxMetric(A[1..n]): It should consist of MaxSubarraySum(A[1..n]), MaxSuffixSum(A[1..n]), MaxPrefixSum(A[1..n]) and Sum(A[1..n]). The implementation in Python will be as follow(I will use named tuple to represent those 4 targets we want to calculate for MaxMetric(A[1..n])):



What is the time complexity this time? Well, since we reduce the O(n) workload at the current recursive call to O(1), that means if we use the same notation for the time complexity analysis from the first part of this article, we will essentially have:

Expanding this equation or using the master theorem will give us the final answer for Tn: O(n).


We can take another route since the previous improvements are all based on the very first “brute force” recursive rule. Next time, we will look at our second recursive rule to see where the recursion can lead to. Keep tuned!





来Offer 2017年春季班1期

2月6号正式开课



Who We Are

来Offer网 (www.laioffer.com) 由清华大学计算机系在硅谷顶级科技公司(Google, Facebook,Uber)Director & Manager级别校友组成的职业培训机构。成员中有国际信息学奥赛International Olympiad in Informatics (IOI)中国国家队教练,Facebook 最早期的中国工程师经理和中国大陆招聘工程师负责人, 高考省理科状元,Stanford, CMU, Harvard, USC 等校CS Ph.D.组成。

What We Do

用最顶尖的师资力量带出高水平的学生:让强者更强,拿到一线大公司的Offer, 让转专业的同学迅速系统提高,拿到SponsorH1B的正规公司的Offer. 拿Offer不仅仅靠算法,而是系统素质的展现,包括英语表达沟通能力,Coding质量,多线程,System Design, OO Design,以及对美国职场最基本的理解。我们不仅仅是算法培训机构,而是一个培训同学们高成功率拿到Offer的职业培训机构。


  • FLAG 级别 Manager Level班主任负责制,有问题直接语音问答;每班配备10+名主讲老师,精心为同学们课后答疑和 1对1 code review.

  • 独立Online Coding训练系统 code.laioffer.com (300+最新大公司真题只对内部学员开放)

  • Google/Facebook engineer 上机课手把手教你编程

  • 每月一次跟踪考试, 老师1对1修改coding

  • 英文口语/书面的提高

  • 一线大公司Director/Manager level的老师, 内部推荐+面试综合技术提高

  • Internship level 3个月完成的实战project (可选课程)

  • 免费重复听,直到找到工作


高成功率

高成功率是我们唯一的标准: 2013年成立以来我们已经帮助数百名同学拿到Offer,成功率稳定在 80%。 其中Google, Facebook, Uber, Box, Microsoft, Yahoo, Amazon, Indeed, Hulu, IBM 等大中型公司超过半数。真名实姓Offer榜请见www.laioffer.com


欢迎发送简历至info@laioffer.com报名

我们将在24小时内联系您




您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存