sqlite - Share a dict with multiple Python scripts

Question

Welcome To Ask or Share your Answers For Others

sqlite - Share a dict with multiple Python scripts

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

sqlite - Share a dict with multiple Python scripts

I'd like a unique dict (key/value) database to be accessible from multiple Python scripts running at the same time.

If script1.py updates d[2839], then script2.py should see the modified value when querying d[2839] a few seconds after.

I thought about using SQLite but it seems that concurrent write/read from multiple processes is not SQLite's strength (let's say script1.py has just modified d[2839], how would script2.py's SQLite connection know it has to reload this specific part of the database?)
I also thought about locking the file when I want to flush the modifications (but it's rather tricky to do), and use json.dump to serialize, then trying to detect the modifications, use json.load to reload if any modification, etc. ... oh no I'm reinventing the wheel, and reinventing a particularly inefficient key/value database!
redis looked like a solution but it does not officially support Windows, the same applies for leveldb.
multiple scripts might want to write at exactly the same time (even if this is a very rare event), is there a way to let the DB system handle this (thanks to a locking parameter? it seems that by default SQLite can't do this because "SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time.")

What would be a Pythonic solution for this?

Note: I'm on Windows, and the dict should have maximum 1M items (key and value both integers).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:40:08+0000

Mose of embedded datastore other than SQLite doesn't have optimization for concurrent access, I was also curious about SQLite concurrent performance too, so I did a benchmark:

import time
import sqlite3
import os
import random
import sys
import multiprocessing


class Store():

    def __init__(self, filename='kv.db'):
        self.conn = sqlite3.connect(filename, timeout=60)
        self.conn.execute('pragma journal_mode=wal')
        self.conn.execute('create table if not exists "kv" (key integer primary key, value integer) without rowid')
        self.conn.commit()

    def get(self, key):
        item = self.conn.execute('select value from "kv" where key=?', (key,))
        if item:
            return next(item)[0]

    def set(self, key, value):
        self.conn.execute('replace into "kv" (key, value) values (?,?)', (key, value))
        self.conn.commit()


def worker(n):
    d = [random.randint(0, 1<<31) for _ in range(n)]
    s = Store()
    for i in d:
        s.set(i, i)
    random.shuffle(d)
    for i in d:
        s.get(i)


def test(c):
    n = 5000
    start = time.time()
    ps = []
    for _ in range(c):
        p = multiprocessing.Process(target=worker, args=(n,))
        p.start()
        ps.append(p)
    while any(p.is_alive() for p in ps):
        time.sleep(0.01)
    cost = time.time() - start
    print(f'{c:<10d}{cost:<7.2f}{n/cost:<20.2f}{n*c/cost:<14.2f}')


def main():
    print(f'concurrencytime(s)pre process TPS(r/s)total TPS(r/s)')
    for c in range(1, 9):
        test(c)


if __name__ == '__main__':
    main()

result on my 4 cores macOS box, SSD volume:

concurrency time(s) pre process TPS(r/s)    total TPS(r/s)
1           0.65    7638.43                 7638.43
2           1.30    3854.69                 7709.38
3           1.83    2729.32                 8187.97
4           2.43    2055.25                 8221.01
5           3.07    1629.35                 8146.74
6           3.87    1290.63                 7743.78
7           4.80    1041.73                 7292.13
8           5.37    931.27                  7450.15

result on an 8 cores windows server 2012 cloud server, SSD volume:

concurrency     time(s) pre process TPS(r/s)    total TPS(r/s)
1               4.12    1212.14                 1212.14
2               7.87    634.93                  1269.87
3               14.06   355.56                  1066.69
4               15.84   315.59                  1262.35
5               20.19   247.68                  1238.41
6               24.52   203.96                  1223.73
7               29.94   167.02                  1169.12
8               34.98   142.92                  1143.39

turns out overall throughput is consistent regardless of concurrency, and SQLite is slower on windows than macOS, hope this is helpful.

As SQLite write lock is database wise, in order to get more TPS, you could partition data to multi-database files:

class MultiDBStore():

    def __init__(self, buckets=5):
        self.buckets = buckets
        self.conns = []
        for n in range(buckets):
            conn = sqlite3.connect(f'kv_{n}.db', timeout=60)
            conn.execute('pragma journal_mode=wal')
            conn.execute('create table if not exists "kv" (key integer primary key, value integer) without rowid')
            conn.commit()
            self.conns.append(conn)

    def _get_conn(self, key):
        assert isinstance(key, int)
        return self.conns[key % self.buckets]

    def get(self, key):
        item = self._get_conn(key).execute('select value from "kv" where key=?', (key,))
        if item:
            return next(item)[0]

    def set(self, key, value):
        conn = self._get_conn(key)
        conn.execute('replace into "kv" (key, value) values (?,?)', (key, value))
        conn.commit()

result on my mac with 20 partitions:

concurrency time(s) pre process TPS(r/s)    total TPS(r/s)
1           2.07    4837.17                 4837.17
2           2.51    3980.58                 7961.17
3           3.28    3047.68                 9143.03
4           4.02    2486.76                 9947.04
5           4.44    2249.94                 11249.71
6           4.76    2101.26                 12607.58
7           5.25    1903.69                 13325.82
8           5.71    1752.46                 14019.70

total TPS is higher than single database file.

Categories

sqlite - Share a dict with multiple Python scripts

sqlite - Share a dict with multiple Python scripts

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags