Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
421 views
in Technique[技术] by (71.8m points)

python - Is there anything faster than dict()?

I need a faster way to store and access around 3GB of k:v pairs. Where k is a string or an integer and v is an np.array() that can be of different shapes. Is there any object, that is faster than the standard python dict in storing and accessing such a table? For example, a pandas.DataFrame?

As far I have understood python dict is a quite fast implementation of a hashtable, is there anything better than that for my specific case?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

No, there is nothing faster than a dictionary for this task and that’s because the complexity of its indexing (getting and setting item) and even membership checking is O(1) in average. (check the complexity of the rest of functionalities on Python doc https://wiki.python.org/moin/TimeComplexity )

Once you saved your items in a dictionary, you can have access to them in constant time which means that it's unlikely for your performance problem to have anything to do with dictionary indexing. That being said, you still might be able to make this process slightly faster by making some changes in your objects and their types that may result in some optimizations at under the hood operations.

e.g. If your strings (keys) are not very large you can intern the lookup key and your dictionary's keys. Interning is caching the objects in memory --or as in Python, table of "interned" strings-- rather than creating them as a separate object.

Python has provided an intern() function within the sys module that you can use for this.

Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup...

also ...

If the keys in a dictionary are interned and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer comparison instead of comparing the string values themselves which in consequence reduces the access time to the object.

Here is an example:

In [49]: d = {'mystr{}'.format(i): i for i in range(30)}

In [50]: %timeit d['mystr25']
10000000 loops, best of 3: 46.9 ns per loop

In [51]: d = {sys.intern('mystr{}'.format(i)): i for i in range(30)}

In [52]: %timeit d['mystr25']
10000000 loops, best of 3: 38.8 ns per loop

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...