If your "numbers" are simple-enough ones (signed or unsigned integers of up to 4 bytes each, or floats of 4 or 8 bytes each), I recommend the standard library array module as the best way to keep a few millions of them in memory (the "tip" of your "virtual array") with a binary file (open for binary R/W) backing the rest of the structure on disk. array.array
has very fast fromfile
and tofile
methods to facilitate the moving of data back and forth.
I.e., basically, assuming for example unsigned-long numbers, something like:
import os
# no more than 100 million items in memory at a time
MAXINMEM = int(1e8)
class bigarray(object):
def __init__(self):
self.f = open('afile.dat', 'w+')
self.a = array.array('L')
def append(self, n):
self.a.append(n)
if len(self.a) > MAXINMEM:
self.a.tofile(self.f)
del self.a[:]
def pop(self):
if not len(self.a):
try: self.f.seek(-self.a.itemsize * MAXINMEM, os.SEEK_END)
except IOError: return self.a.pop() # ensure normal IndexError &c
try: self.a.fromfile(self.f, MAXINMEM)
except EOFError: pass
self.f.seek(-self.a.itemsize * MAXINMEM, os.SEEK_END)
self.f.truncate()
return self.a.pop()
Of course you can add other methods as necessary (e.g. keep track of the overall length, add extend
, whatever), but if pop
and append
are indeed all you need this should serve.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…