Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
70 views
in Technique[技术] by (71.8m points)

python 3.x - Python3 Out of Memory Error but Linux not using full RAM

I have a python3 program (that uses pytorch) that is receiving an "Out of Memory" error. When looking at free -t and htop I see that there is still RAM and swap available. On htop I see that my Virtual Memory is larger than my available RAM + swap.
I have looked around for answers, as it seems to be a problem that can be common, and I found some things that helped me understand more and look further, but ultimately I remained stuck.

My System

I am running 64bit Ubuntu 18.04.2 LTS with kernel 4.15.0-45-generic on a cloud VM service.
My program is using Python 3.6.9 and uses pytorch 1.6 (using torchjit) and tornado 6.1 (not sure if relevant at all...).
My python3 program is a tornado server that receives requests intended to run inference of a neural network.

My Problem

All seems to work well for some hours replying to multiple requests but after some hours seemingly randomly it reaches an "Out Of Memory" Error. This will happen to all requests for a few minutes to a few hours, after-which it will start processing new requests properly again and the cycle goes on. The specific error I get is generated by SOX and is:

ERROR:sox:OSError: SoX failed! [Errno 12] Cannot allocate memory

But I know that SOX itself doesn't use much memory at all, and this happens to multiple requests in a row so it seems like my general python program is just out of memory (and not that SOX has a memory spike that causes this).

Things I noticed

Looking at htop I get:
htop result Looking at free -t I get:
free -t result
Looking at df -h:
enter image description here

From what I can tell the Virtual Memory (3003M) far exceeds my RAM + swap (~2236M). At the same time still have at least 330M free between my RAM and my swap. I also still have memory available on my HDD so the Virtual Memory should still have room to allocate more memory to if it wanted (and not running into "Out Of Memory" issues.
From reading about this it is normal that the Virtual Memory exceeds my RAM + swap but I still don't really understand when the virtual memory decides to use which resource and where it stores data when it is not using RAM and swap.

What I have tried

  • I tried testing the garbage collector in python to see if maybe it was not freeing up memory but from my testing it seems like the garbage collector is doing its job as intended and there isn't some build-up of things to be collected that aren't. On average the garbage collector resets itself (frees the memory) after 2 requests to the server.
  • I have checked if some automatic system updates were taking up memory and thus creating a momentary system memory shortage - this is not the case as far as I can tell looking at system logs
  • I monitored htop and my logs in real time while receiving requests and did not notice any memory usage spikes.
  • Restarting the python program fixes the memory problem when it occurs (until it pops up again), but this is not a good solution for me.
  • Reproducing the problem manually has proved to be a problem. I tried sending the same requests that were sent to the server between 2 of the times the memory error occurred but this did not reproduce it. This has made solving this a lot harder as in order to have a look at the problem I have to wait for it to happen organically and hope it doesn't go away before I figure things out.

Questions

My main question is does this memory issue seem like a python3 problem with my code (pytorch? torchjit? memory leak? even if it is uncommon in python...) or a Linux problem? How do I go about figuring out what the problem actually is and solving it?
Other questions that have been added along the way of trying to solve this problem:

  • Why isnt my RAM and swap being fully utilized before my VM allocates memory to my program elsewhere
  • Where does my VM store the memory that it allocates? If I can find this out I can figure out where the memory shortage is, because as it stands I can not find any part of my system that has 100% memory usage.
  • I read that maybe my program is creating many "empty memory pages" or things of the sort that may have allocated memory to the program that it no longer needs. Is there a way to clean these things in python3 other than using the garbage collector?

Thank you for your time and any help you may be able to provide. If you need more information from me please let me know.

question from:https://stackoverflow.com/questions/65844447/python3-out-of-memory-error-but-linux-not-using-full-ram

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...