Stack and Heap
Moderator: MaxCoderz Staff
Stack and Heap
I won't go into too much detail of what I'm doing right now for the sake of time.
I was wondering if anyone has any good ideas on memory usage regarding arrays. I am implementing the Needleman-Wunsch algorithm for comparing two strings. The first and most easy idea to implement is that of a 2D array storing values, then travelling back to recover the best string comparison.
My problem is that I have to create an array of size string1.length+1 by string2.length+1; which is massive and requires a lot of memory.
I was thinking of travelling with a 2*2 array to find the edit distance between two strings and then create a "Cell" object for each cell and have a pointer to where the "Cell" object can travel back to recover the best string comparison.
Any ideas? Any questions?
I was wondering if anyone has any good ideas on memory usage regarding arrays. I am implementing the Needleman-Wunsch algorithm for comparing two strings. The first and most easy idea to implement is that of a 2D array storing values, then travelling back to recover the best string comparison.
My problem is that I have to create an array of size string1.length+1 by string2.length+1; which is massive and requires a lot of memory.
I was thinking of travelling with a 2*2 array to find the edit distance between two strings and then create a "Cell" object for each cell and have a pointer to where the "Cell" object can travel back to recover the best string comparison.
Any ideas? Any questions?
Re: Stack and Heap
Could you elaborate on this a little more? I'm not sure I understand your plan of attackWesley wrote:I was thinking of travelling with a 2*2 array to find the edit distance between two strings and then create a "Cell" object for each cell and have a pointer to where the "Cell" object can travel back to recover the best string comparison.
-
- Calc King
- Posts: 1513
- Joined: Sat 05 Aug, 2006 7:22 am
Re: Stack and Heap
How long are the strings?
What language will this be in?
And where/how do stacks an heaps come into this - do you mean The Heap and The Stack? Since your array is dynamically allocated (dynamic size, so.., well you could use alloca.. but The Stack is typically only about 1MB) it would be on the heap but that shouldn't be a problem..
How about a sparse matrix?
What language will this be in?
And where/how do stacks an heaps come into this - do you mean The Heap and The Stack? Since your array is dynamically allocated (dynamic size, so.., well you could use alloca.. but The Stack is typically only about 1MB) it would be on the heap but that shouldn't be a problem..
How about a sparse matrix?
Re: Stack and Heap
Okay, a more detailed description of the goal is here.
1) The strings can be as long as 500,000 characters long (or longer). My goal is to make my program work for 500,000 character-long strings.
2) I am coding in Java. That was a good question, since Java handles memory quite differently from other languages (such as C++ and C).
3) I am unsure what you mean by
"stacks an heaps come into this - do you mean The Heap and The Stack?"
You see, my problem is this. Whenever I have to compare two strings I need a grid of integers: opt[N+1][M+1] where N and M are the sizes of the two strings being compared. Do you see where this can get really ugly?
When I want to compare a string of size N or M (assuming the two are the same) that is even 100,000 characters long, I need a 100,000*100,000 grid of int values. That means I need 100,000*100,000*(4 bytes) = 40,000,000,000 bytes! That's roughly 40,000 MB.
So, my goal is NOT to use a grid in this manner. So, I'm thinking of using "Cell" objects which have references to their previous cell. I'm not sure if this will work, because I would need to have separate "Cell" objects so they don't get collected by the garbage collector.
1) The strings can be as long as 500,000 characters long (or longer). My goal is to make my program work for 500,000 character-long strings.
2) I am coding in Java. That was a good question, since Java handles memory quite differently from other languages (such as C++ and C).
3) I am unsure what you mean by
"stacks an heaps come into this - do you mean The Heap and The Stack?"
You see, my problem is this. Whenever I have to compare two strings I need a grid of integers: opt[N+1][M+1] where N and M are the sizes of the two strings being compared. Do you see where this can get really ugly?
When I want to compare a string of size N or M (assuming the two are the same) that is even 100,000 characters long, I need a 100,000*100,000 grid of int values. That means I need 100,000*100,000*(4 bytes) = 40,000,000,000 bytes! That's roughly 40,000 MB.
So, my goal is NOT to use a grid in this manner. So, I'm thinking of using "Cell" objects which have references to their previous cell. I'm not sure if this will work, because I would need to have separate "Cell" objects so they don't get collected by the garbage collector.
-
- New Member
- Posts: 67
- Joined: Sun 09 Nov, 2008 1:56 pm
Re: Stack and Heap
cant you just lz77 compress the entire.. thing?
modified lz77 seems like the perfect algorithm for comparing strings strings
(then again, im far too lazy to find out what proper string alignement is )
modified lz77 seems like the perfect algorithm for comparing strings strings
(then again, im far too lazy to find out what proper string alignement is )
-
- Calc King
- Posts: 1513
- Joined: Sat 05 Aug, 2006 7:22 am
Re: Stack and Heap
Well I don't see how lz77 would come in.. You'd still need to store those grid entries "somewhere", and that "somewhere" had better not get too big..
Anyway this is DP, and with DP you can always cheat - for example, instead of just filling a grid, you could do it the "old fashioned" way: recursion with memoization (and then treat the memoization "database" as a limited-size cache)
That's possible..
Anyway, this algorithm uses 2 passes, 1 to fill a matrix and 1 to go back and "follow the optimal path". But that doesn't mean that you actually have to do that, what's wrong with building the optimal path while you're filling that matrix? Nothing, right? And if you do that, you may find that you only need to store row i-1 and row i itself (and a little something extra to remember which "way you went" at every "crossroad")
Then again it is rather late, so I may just be spewing nonsense.
This trick applies to many DP algorithms though
Anyway this is DP, and with DP you can always cheat - for example, instead of just filling a grid, you could do it the "old fashioned" way: recursion with memoization (and then treat the memoization "database" as a limited-size cache)
That's possible..
Anyway, this algorithm uses 2 passes, 1 to fill a matrix and 1 to go back and "follow the optimal path". But that doesn't mean that you actually have to do that, what's wrong with building the optimal path while you're filling that matrix? Nothing, right? And if you do that, you may find that you only need to store row i-1 and row i itself (and a little something extra to remember which "way you went" at every "crossroad")
Then again it is rather late, so I may just be spewing nonsense.
This trick applies to many DP algorithms though
Re: Stack and Heap
King Harold, you are now on the same track as me. Wow. We are thinking the exact same thing!
I am trying to implement dynamic programming. That's where my idea of the "Cell" objects comes in. Where each cell progressively has a path back to the beginning (or in this case the end of the strings).
One of my questions lies in this cell communication. When does the Java garbage collector actually scoop up everything (it's after)? Is it after a method is finished? If that is so, this plan won't be too difficult.
I am trying to implement dynamic programming. That's where my idea of the "Cell" objects comes in. Where each cell progressively has a path back to the beginning (or in this case the end of the strings).
One of my questions lies in this cell communication. When does the Java garbage collector actually scoop up everything (it's after)? Is it after a method is finished? If that is so, this plan won't be too difficult.
Re: Stack and Heap
The Java garbage collector will only reclaim allocated data if there is no longer any reference to the data. For example:
After the third statement, both s1 and s2 are pointing to the "blah blah" string, and nothing is pointing to the "blah" string, so the garbage collector will free up the "blah" string since it is unreachable to the programmer after the third line. Now imagine that the following code is executed after the code above:
Remember that s1 still points to "blah blah", so the "blah blah" string will not be deallocated.
As for whether the garbage collector will reclaim an object after a method ends, again it depends on whether or not a reference variable still points to the object at that point in the execution of the program.
Code: Select all
String s1 = new String("blah");
String s2 = new String("blah blah");
s1 = s2;
Code: Select all
s2 = new String("flippy floppy");
As for whether the garbage collector will reclaim an object after a method ends, again it depends on whether or not a reference variable still points to the object at that point in the execution of the program.
Re: Stack and Heap
Hmm... From what I'm finding, the Needleman-Wunsch algorithm seems to only compute in O(MN) (M and N being the size of the two strings being compared) time and takes up at least MN memory (depending on the type of objects being used).
I'm not sure what more I can do.
I guess I could use shorts or even bytes instead of integers. That would reduce the size some, but not enough for a 500,000 character string.
I think the cell idea is the only way, but I'm quite sketchy on how I'd go about implementing it.
I'm not sure what more I can do.
I guess I could use shorts or even bytes instead of integers. That would reduce the size some, but not enough for a 500,000 character string.
I think the cell idea is the only way, but I'm quite sketchy on how I'd go about implementing it.
Re: Stack and Heap
Perhaps Hirschberg will come to the rescue.
Re: Stack and Heap
That's awesome and crazy all at the same time!
I originally had the idea presented in the Hirschberg algorithm. I didn't think it would work. It's basically implementing the same thing as my second idea, the "Cell" objects, but with only the arrays and no "Cell"s.
Great! I will get working on research again. Thanks again!
I originally had the idea presented in the Hirschberg algorithm. I didn't think it would work. It's basically implementing the same thing as my second idea, the "Cell" objects, but with only the arrays and no "Cell"s.
Great! I will get working on research again. Thanks again!
Re: Stack and Heap
So much for working ahead! My professor just posted Hirschberg's paper on his algorithm for computing the LCS. LCS is somewhat different from the Edit Distance problem, but the algorithm can be used in the same fashion.
Why, oh why, must the diligent suffer!?
See the article here.
I'm kind of confused on how he slices the grid in twos...
Why, oh why, must the diligent suffer!?
See the article here.
I'm kind of confused on how he slices the grid in twos...
Last edited by Wesley on Fri 09 Oct, 2009 12:57 pm, edited 1 time in total.
-
- Calc King
- Posts: 1513
- Joined: Sat 05 Aug, 2006 7:22 am
Re: Stack and Heap
So.. you go to a Chinese college?
Re: Stack and Heap
I'm not sure what you mean... I was simply looking around for how Hirschberg solved the problem. I spent an awful long time reading other articles and trying to figure out the algorithm, but to no avail. Until, I found Hirschberg's paper. But, by the time I found it, my professor announced that he'd post a paper about the Edit Distance problem, and he posted the very same article!King Harold wrote:So.. you go to a Chinese college?
lol. I see what you mean now. That link isn't from my school. My professor used Blackboard to post the paper. I only posted a link to a public copy. I go to Boise State University. It's not saying much though. I'm finding more and more how uncompetitive BSU is... We are so behind in our computer science classes compared to other universities. I should've been learning about data structures and algorithms my freshman year, not my sophomore year.
EDIT: At least I have my own side web page! I just haven't had much time to add to it... I've never done anything with web development, but I like what I've done to it so far; at least it looks better than my CS professor's page.
-
- Calc King
- Posts: 1513
- Joined: Sat 05 Aug, 2006 7:22 am
Re: Stack and Heap
Oh I see, sorry
btw, you're allowed to learn in your own time
That's what I did, in fact all of the stuff I've posted about here is stuff I learned on my own
And of course it's the only way to get to be actually better than the rest..
Good luck
Oh and edit distance is nice, it easily allows you to do the "2 rows"-trick, so that's cool
btw, you're allowed to learn in your own time
That's what I did, in fact all of the stuff I've posted about here is stuff I learned on my own
And of course it's the only way to get to be actually better than the rest..
Good luck
Oh and edit distance is nice, it easily allows you to do the "2 rows"-trick, so that's cool