Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
469 views
in Technique[技术] by (71.8m points)

python - How to speed up loading data from oracle sql to pandas df

My code looks like this, i use pd.DataFrame.from_records to fill data into the dataframe, but it takes Wall time: 1h 40min 30s to process the request and load data from the sql table with 22 mln rows into df.

# I skipped some of the code, since there are no problems with the extract of the query, it's fast
cur = con.cursor()

def db_select(query): # takes the request text and sends it to the data_frame
    cur.execute(query)
    col = [column[0].lower() for column in cur.description] # parse headers
    df = pd.DataFrame.from_records(cur, columns=col) # fill the data into the dataframe
    return df

Then I pass the sql query to the function:

frame = db_select("select * from table")

How can i optimize code for speed up process?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Setting proper value for cur.arraysize might help for tuning fetch performance . You need to determine the most suitable value for it. The default value is 100. A code with a different array sizes might be run in order to determine that value such as

arr=[100,1000,10000,100000,1000000]
for size in arr:
        try:
            cur.prefetchrows = 0
            cur.arraysize = size
            start = datetime.now()
            cur.execute("SELECT * FROM mytable").fetchall()
            elapsed = datetime.now() - start
            print("Process duration for arraysize ", size," is ", elapsed, " seconds")
        except Exception as err:
            print("Memory Error ", err," for arraysize ", size) 

and then set such as cur.arraysize = 10000 before calling db_select from your original code


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...