Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
97 views
in Technique[技术] by (71.8m points)

Best and quickest way to get top N elements from a huge list in python

I am trying to workout the best solution for getting the top N elements with biggest number from a huge list with size of a few billions. So far, I have got the idea of:

get the first N elements, sort them in descending order (list A). 
for N+1 to last element:
    min = the Nth element. 
    if the N+1 element > min then insert it into list A and sort it. 
        remove the last element

Practically, seems like it doesn't consume too much memory, and faster than just using list.sort of the entire huge list follow by getting top N elements

However, this sorting doesn't use the full capacity of the CPU with multi-cores. Is there any built-in function or any other approaches that would do the job with multi-processes? or able to fully utilizes the computing capabilities which would result much faster?

question from:https://stackoverflow.com/questions/65839988/best-and-quickest-way-to-get-top-n-elements-from-a-huge-list-in-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you are looking to use parallelize the work, you could use a python library such as Ray.

Using Ray, you could parallelize your search by partitioning the data into multiple sets and having each thread attempt to find the largest N numbers of each subset. Afterwards, you should have k lists of N 'large' numbers. From there, you can find the largest N numbers.

If you would like to learn more about Ray documentation, you can check out the documentation.

Documentation: https://docs.ray.io/en/latest/


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...