Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
365 views
in Technique[技术] by (71.8m points)

python - Resizing numpy.memmap arrays

I'm working with a bunch of large numpy arrays, and as these started to chew up too much memory lately, I wanted to replace them with numpy.memmap instances. The problem is, now and then I have to resize the arrays, and I'd preferably do that inplace. This worked quite well with ordinary arrays, but trying that on memmaps complains, that the data might be shared, and even disabling the refcheck does not help.

a = np.arange(10)
a.resize(20)
a
>>> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

a = np.memmap('bla.bin', dtype=int)
a
>>> memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

a.resize(20, refcheck=False)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-41-f1546111a7a1> in <module>()
----> 1 a.resize(20, refcheck=False)

ValueError: cannot resize this array: it does not own its data

Resizing the underlying mmap buffer works perfectly fine. The problem is how to reflect these changes to the array object. I've seen this workaround, but unfortunately it doesn't resize the array in place. There is also some numpy documentation about resizing mmaps, but it's clearly not working, at least with version 1.8.0. Any other ideas, how to override the inbuilt resizing checks?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The issue is that the flag OWNDATA is False when you create your array. You can change that by requiring the flag to be True when you create the array:

>>> a = np.require(np.memmap('bla.bin', dtype=int), requirements=['O'])
>>> a.shape
(10,)
>>> a.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> a.resize(20, refcheck=False)
>>> a.shape
(20,)

The only caveat is that it may create the array and make a copy to be sure the requirements are met.

Edit to address saving:

If you want to save the re-sized array to disk, you can save the memmap as a .npy formatted file and open as a numpy.memmap when you need to re-open it and use as a memmap:

>>> a[9] = 1
>>> np.save('bla.npy',a)
>>> b = np.lib.format.open_memmap('bla.npy', dtype=int, mode='r+')
>>> b
memmap([0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Edit to offer another method:

You may get close to what you're looking for by re-sizing the base mmap (a.base or a._mmap, stored in uint8 format) and "reloading" the memmap:

>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> a[3] = 7
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
>>> a.flush()
>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
>>> a.base.resize(20*8)
>>> a.flush()
>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...