I am currently using a python notebook to run the following function that I would like to map to a pandas series.
def get_number_of_activated_pixels(image_path):
im = io.imread(image_path)
n_activated = (im > 0).sum()
return n_activated
The function simply reads an image path into a numpy array using skimage's io, and then returns the number of non 0 pixels.
When I try using the df.map function to apply the function on the series containing the paths I get drastically different performance when I run the same cell for the second time.
I am using the snippet below in the cell:
start = timer()
test = test_df.map(get_number_of_activated_pixels)
end = timer()
print(end - start) # Time in seconds
When I run the cell for the first time it takes about 100 seconds, as for when I run the same cell for the second time, it runs in only 18 seconds.
What can I attribute this huge difference in performance to? Is python doing some caching behind the scenes? If so, can someone please elaborate what is going on?
question from:
https://stackoverflow.com/questions/65646633/performance-difference-when-running-same-cell-twice-in-python-notebook-pandas-df 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…