Performance difference when running same cell twice in python notebook pandas df.map

Question

Welcome To Ask or Share your Answers For Others

Performance difference when running same cell twice in python notebook pandas df.map

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Performance difference when running same cell twice in python notebook pandas df.map

I am currently using a python notebook to run the following function that I would like to map to a pandas series.

def get_number_of_activated_pixels(image_path):
    im = io.imread(image_path)
    n_activated = (im > 0).sum()
    return n_activated

The function simply reads an image path into a numpy array using skimage's io, and then returns the number of non 0 pixels.

When I try using the df.map function to apply the function on the series containing the paths I get drastically different performance when I run the same cell for the second time.

I am using the snippet below in the cell:

start = timer()
test = test_df.map(get_number_of_activated_pixels)
end = timer()
print(end - start) # Time in seconds

When I run the cell for the first time it takes about 100 seconds, as for when I run the same cell for the second time, it runs in only 18 seconds.

What can I attribute this huge difference in performance to? Is python doing some caching behind the scenes? If so, can someone please elaborate what is going on?

question from:https://stackoverflow.com/questions/65646633/performance-difference-when-running-same-cell-twice-in-python-notebook-pandas-df

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

Performance difference when running same cell twice in python notebook pandas df.map

Performance difference when running same cell twice in python notebook pandas df.map

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags