I use python.multiprocessing.sharedctypes.RawArray
to share large numpy arrays between multiple processes. And I've noticed that when this array is large (> 1 or 2 Gb) it becomes very slow to initialize and also much slower to read/write to (and read/write time is not predictable, sometimes pretty fast, sometimes very very slow).
I've made a small sample script that uses just one process, initialize a shared array and write to it several times. And measures time to do these operations.
import argparse
import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy as np
import time
def main():
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('-c', '--block-count', type=int, default=1,
help='Number of blocks to write')
parser.add_argument('-w', '--block-width', type=int, default=20000,
help='Block width')
parser.add_argument('-d', '--block-depth', type=int, default=15000,
help='Block depth')
args = parser.parse_args()
blocks = args.block_count
blockwidth = args.block_width
depth = args.block_depth
start = time.perf_counter()
shared_array = mpsc.RawArray(ctypes.c_uint16, blocks*blockwidth*depth)
finish = time.perf_counter()
print('Init shared array of size {:.2f} Gb: {:.2f} s'.format(blocks*blockwidth*depth*ctypes.sizeof(ctypes.c_uint16)/1024/1024/1024, (finish-start)))
numpy_array = np.ctypeslib.as_array(shared_array).reshape(blocks*blockwidth, depth)
start = time.perf_counter()
for i in range(blocks):
begin = time.perf_counter()
numpy_array[i*blockwidth:(i+1)*blockwidth, :] = np.ones((blockwidth, depth), dtype=np.uint16)
end = time.perf_counter()
print('Write = %.2f s' % (end-begin))
finish = time.perf_counter()
print('Total time = %.2f s' % (finish-start))
if __name__ == '__main__':
main()
When I run this code I get the following on my PC:
$ python shared-minimal.py -c 1
Init shared array of size 0.56 Gb: 0.36 s
Write = 0.13 s
Total time = 0.13 s
$ python shared-minimal.py -c 2
Init shared array of size 1.12 Gb: 0.72 s
Write = 0.12 s
Write = 0.13 s
Total time = 0.25 s
$ python shared-minimal.py -c 4
Init shared array of size 2.24 Gb: 5.40 s
Write = 1.17 s
Write = 1.17 s
Write = 1.17 s
Write = 1.57 s
Total time = 5.08 s
In the last case, when array size is more than 2 Gb, initialization time is not linearly dependent on array size, and assigning save size slices to the array is more than 5 times slower.
I wonder why that happens. I'm running the script on Ubuntu 16.04 using Python 3.5. I also noticed by using iotop that when initializing and writing to the array there is a disk writing activity with same size as shared array, but I'm not sure if a real file is created or it's only in-memory operation (I suppose it should be). In general my system becomes less responsive as well in case of large shared array. There is no swapping, checked with top
, ipcs -mu
and vmstat
.
See Question&Answers more detail:
os