I'm developping a Rails application that deals with huge amounts of
data and it halts since it uses all memory of my computer due to
memory leak (allocated objects that are not released).
In my application, data is organized in a hierarchical way, as a tree,
where each node of level "X" contains the sum of data of level
"X+1". For example if the data of level "X+1" contains the amount of
people in cities, level "X" contains the amount of people in
states. In this way, level "X"'s data is obtained by summing up the
amount of data in level "X+1" (in this case, people).
For the sake of this question, consider a tree with four levels:
country, State, City and Neighbourhoods and that each level is mapped
into Activerecords tables (countries, states, cities, neighbourhoods).
Data is read from a csv file that fills the leaves of the tree, that is,
the neighbourhoods table.
Afetr that, data flows from bottom (neighbourhoods) to top (countries) in the following sequence:
1) Neighbourhoods data is summed to Cities;
2) after step 1 is completed, Cities data is summed to States;
3) after step 2 is completed, States data is summed to Country;
The schematic code I'm using is as follows:
1 cities = City.all
2 cities.each do |city|
3 city.data = 0
4 city.neighbourhoods.each do |neighbourhood|
5 city.data = city.data + neighbourhood.data
6 end
7 city.save
8 end
The lowest level of the tree contains 3.8M of records. Each time lines
2-8 are executed, a city is summed up and after line 8 is executed,
that subtree is no longer needed, but it is never released (memory
leak). After summing 50% of the cities, all my 8Gbytes of RAM
vanishes.
My question is what can I do. Buy better hardware will not
do since I'm working with a "small" prototype.
I know a way to make it work: restart the application for each City,
but I hope someone has a better idea. The "simplest" would be to force
the garbage collector to free specific objects, but seems is not a way
to do it
(https://www.ruby-forum.com/t/how-do-i-force-ruby-to-release-memory/195515).
From the following articles I understood that the developer should
organize the data in a way to "suggest" the garbage collector what
should be freed. Maybe another approach will do the trick, but the only
alternative I see is Depth-first search approach instead of the
reversed Breadth-first search I'm using, but I don't see why it should work.
What I read so far:
https://stackify.com/how-does-ruby-garbage-collection-work-a-simple-tutorial/
https://www.toptal.com/ruby/hunting-ruby-memory-issues
https://scoutapm.com/blog/ruby-garbage-collection
https://scoutapm.com/blog/manage-ruby-memory-usage
Thanks
question from:
https://stackoverflow.com/questions/65836011/how-to-deal-with-memory-leak-in-ruby-rails