In this topic I'll present a few algorithms which can be used to determine the maximum sum of a succeeing subsequence. First I'll define the problem more exactly. We have a sequence of n integer numbers. I'll note them
| CODE |
| {a(1);a(2);a(3);...;a(n)} |
These numbers may be positiv, negativ or 0. I'll note the sum of a sub sequence the following way
| CODE |
| S(i,j) = a(i) + a(i+1) + a(i+2) + ... + a(j-1) |
Note that it is not a typo that I end at j-1 and not j. This allows S(x,x) to be 0, meaning the sum of an empty sub sequence. If all numbers are negative then S(x,x) will actually be the maximum. S(1, n+1) will be the sum of the whole sequence.
I'll present a few algorithms to maximize S. I'll not show how to determine i and j for the max S. This although isn't hard to add so I'll let it as an exercie for the reader.
First I'll start with a example. Our sequence is
Here the max will be S(3, 7) = 7+5-1+8 = 19. I'll show how to determine that 19.
The first algorithm is the obvious and straight forward implementation. We loop through all possibilities and use another loop to determine the sum.
| CODE |
#include<fstream> #include<algorithm> #include<vector> #include<iterator> #include<iostream> using namespace std;
int main() { ifstream in("in.txt"); vector<int>seq; copy( istream_iterator<int>(in), istream_iterator<int>(), back_inserter(seq) ); size_t size = seq.size(); int max_sum = 0; for(size_t i=0; i<size; ++i) { for(size_t j=i; j<size; ++j) { int sum = 0; for(size_t k = i; k<=j; ++k) sum += seq[k]; max_sum = max(max_sum, sum); } } cout<<max_sum; } |
This algorithm has a runtime of O(n^3). For small input this is alright but very quickly it gets to slow.
To speed things up we can make use of the observation that S(i,j) = S(1, j) - S(1, i-1). We don't store the actually store the numbers but the sum up to that number. With this trick we can eliminate the internal loop and get the runtime down to O(n^2).
| CODE |
#include<fstream> #include<algorithm> #include<vector> #include<iterator> #include<iostream> using namespace std;
int main() { ifstream in("in.txt"); vector<int>seq; copy( istream_iterator<int>(in), istream_iterator<int>(), back_inserter(seq) ); size_t size = seq.size(); int sum = 0; for(size_t i=0; i<size; ++i) { sum += seq[i]; seq[i] = sum; } int max_sum = 0; for(size_t i=0; i<size; ++i) { for(size_t j=i; j<size; ++j) { if(i != 0) max_sum = max(max_sum, seq[j] - seq[i-1]); else max_sum = max(max_sum, seq[j]); } } cout<<max_sum; } |
We can get the runtime down to O(N*log(N)) using a divide and conquer approach. We divide our sequence into 2 equally big parts a and b (or nearly equal if there is an uneven amount of numbers). Now we can say that the maximal sum must be in
- a
- b
- Going over the point where have cut the sequence into 2.
To determine the maximal sum we need to determine the maximum of the 3 above. The maximum of a and b can be determined recursivly. At some point we will have divided our sequence so often that only sequences of the size 1 are left. Here it is trivial to determine the maximum. Either the only element or 0. 0 because an empty sequence is a subsequence.
The third maximum we need to determine can calculated by adding the maximum of a including the last element of a and the maximum of b including the first element of b.
| CODE |
#include<fstream> #include<algorithm> #include<vector> #include<iterator> #include<iostream> using namespace std;
// See www.boost.org for a more detailed description // on tuples and a general implementation. struct Tuple { int a; int b; int c; Tuple(int a, int b, int c): a(a), b(b), c(c){} };
namespace hidden{ struct TieDummy { int&a; int&b; int&c; void operator=(const Tuple&tulpe){ a = tulpe.a; b = tulpe.b; c = tulpe.c; } TieDummy(int&a, int&b, int&c): a(a), b(b), c(c){} }; }
int unused;
hidden::TieDummy tie(int&a, int&b, int&c){ return hidden::TieDummy(a,b,c); }
int max(int a, int b, int c) { return max(a, max(b, c)); }
int max_starting_left(const vector<int>&seq, size_t i, size_t j) { int sum =0, lmax = 0; for(size_t k=i; k<j; ++k) { sum += seq[k]; lmax = max(lmax, sum); } return lmax; }
int max_starting_right(const vector<int>&seq, size_t i, size_t j) { int sum = 0, rmax = 0; for(size_t k=j; k>i; --k) { sum += seq[k-1]; rmax = max(rmax, sum); } return rmax; }
Tulpe max_sum(const vector<int>&seq, size_t i, size_t j) { if(j - i == 1){ int sum = max(seq[i], 0); return Tulpe(sum, sum, sum); }else{ int lmax, rmax, lmax_over, rmax_over; size_t m = (j - i)/2 + i; tie(unused, lmax, lmax_over) = max_sum(seq, i, m); tie(rmax_over, rmax, unused) = max_sum(seq, m, j); return Tulpe(max_starting_left(seq, i, j), max(lmax, lmax_over + rmax_over, rmax), max_starting_right(seq, i, j)); } }
int main() { ifstream in("in.txt"); vector<int>seq; copy( istream_iterator<int>(in), istream_iterator<int>(), back_inserter(seq) ); int sum; tie(unused, sum, unused) = max_sum(seq, 0, seq.size()); cout<<sum; } |
This code looks very complex and when correctly implemented it also is rather fast. One is likly to believe that we have found the best algorithm but this is not the case. There is a simpler one which has a runtime of O(n).
If we know the maximal sum of the n-1 first elements of a sequence then we can say that the maximal sum of n first elements is either
- the maximal sum of the n-1 first elements
- or the maximal sum including the nth element.
This means that we can simply loop over the sequence and only need to keep track of the maximal sum so far and the maximal sum including the last element. How to determine the first should be obvious but the second requiers a bit of reasoning. When moving on to the next element then the maximal sum is either
- the old maximal sum + the element we are considering
- the element itself
- 0
| CODE |
#include<fstream> #include<algorithm> #include<vector> #include<iterator> #include<iostream> using namespace std;
int max(int a, int b, int c) { return max(a, max(b, c)); }
int main() { ifstream in("in.txt"); vector<int>seq; copy( istream_iterator<int>(in), istream_iterator<int>(), back_inserter(seq) ); size_t size = seq.size(); int max_sum = 0, sum = 0; for(size_t i=0; i<size; ++i) { sum = max(sum + seq[i], seq[i], 0); max_sum = max(max_sum, sum); } cout<<max_sum; } |
dr_voodoo,
how exactly the test data should be presented in in.txt file?
I've tried {2;-3;7;5;-1;8}, then simply: 2 -3 7 5 -1 8 ... and in both cases answer was 0.
PS
Yes :) I'm an absolute beginner in C/C++.
My "working" languages are VBA, VBS, SQL, Pascal.
| QUOTE |
| how exactly the test data should be presented in in.txt file? |
Just having the numbers in a file call "in.txt" should work. Make sure the file is in the same directory as the program, and space or newline seperated should both be okay.
KTC,
Now it works!
It is a kind of bug in Dev-C++4!
I pressed F9 and got answer 0, but when I clicked the *.exe itself
the answer turned into right one - 19!
Btw & funnily, but for me it was absolutely not obviously that it can
be done with a linear algorithm! Thanks, dr voodoo!
And I feel ashamed for myself... taking into account my current rank on
this nicest online contester:
http://spoj.sphere.pl/users/zzz/
Dr. Voodoo,
I found this post very interesting, good stuff... Now, I've been developing software for a long, long time, but to be honest, I'm not really as familiar as I should be with O notation for measuring algorithm performance... Its been a long time since I've looked at that stuff. Would you care to maybe explain the basics of that?
Actually you don't measure performance for an algorithm with the O notation. It measures the complexity meaning how well it scales for different sorts of input.
It expresses the time of the algorithm in function of one or more attributes of the input. Often you only use n as attribute which simply means the "size" of the input. Most of the time the "size" is defined more exactly somewhere in a descriptive text.
The unit used is the time that the algorithm needs when all attributes are 1. As this unit is different for each algorithm you don't express the speed of an algorithm but it's scalability. Constant factors are also dropped most of the time because of this reason.
For example the exact complexity of selection sort is n*(n+1)/2 where n is the number of elements to sort. Normally you although say that the complexity is n^2.
O(n^2) means that for an input of size 2 the time needed is 4 times the time needed for an input of size 1.
Common complexities are:
- O(1)
- O(log(n))
- O(n)
- O(n*log(n))
- O(n^2)