Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
242 views
in Technique[技术] by (71.8m points)

python - How to write ndarray to .npy file iteratively with batches

I am generating large dataset for a machine learning application, which is a numpy array with shape (N,X,Y). Here N is the number of samples, X is the input of a sample and Y is the target of a sample. I want to save this array in the .npy format. I have many samples (N is very large) so that the final dataset is about 10+ GB. This means that I cannot create the whole dataset and then save it, as it will flood my memory.

Is it possible to instead to write batches of n samples iteratively to this file? So, I want to append for example batches of 256 samples to the file at once ((256,X,Y)).

question from:https://stackoverflow.com/questions/65882709/how-to-write-ndarray-to-npy-file-iteratively-with-batches

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here is a solution based on numpy's implementaion of save to write a standard npy file including shape and type information:

import numpy as np
import numpy.lib as npl

a = np.random.random((30, 3, 2))
a1 = a[:10]
a2 = a[10:]

filename = 'out.npy'
with open(filename, 'wb+') as f:
    header = npl.format.header_data_from_array_1_0(a1)
    npl.format.write_array_header_1_0(f, header)
    a1.tofile(f)
    a2.tofile(f)
    f.seek(0)
    header['shape'] = (len(a1) + len(a2), *header['shape'][1:])
    npl.format.write_array_header_1_0(f, header)

assert (np.load(filename) == a).all()

This works for C_CONTIGUOUS arrays without Python objects.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...