python - Optimizing pd.read_sql

Question

Welcome To Ask or Share your Answers For Others

python - Optimizing pd.read_sql

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Optimizing pd.read_sql

conn1 = pyodbc.connect('DSN=LUDP-Training Presto',uid='*****', pwd='****', autocommit=True) 

 sql_query =        "SELECT             zsourc_sy, zmsgeo, salesorg, crm_begdat, zmcn, zrtm, crm_obj_id, zcrmprod, prod_hier, hier_type, zsoldto, zendcst, crmtgqtycv, currency, zwukrs, netvalord, zgtnper,zsub_4_t 
                    FROM                `prd_updated`.`bw_ms_zocsfs05l_udl` 
                    WHERE               zdcgflag = 'DCG' AND crm_begdat >= '20200101' AND zmsgeo IN ('AP', 'LA', 'EMEA', 'NA')"

I have to load the following query into a pandas dataframe but the pd.read_sql statement has been loading for more than a couple hours since the table is > 10 million rows of data. Is there a way to speed this process up?

contract_table = pd.read_sql(sql_query,conn1)

question from:https://stackoverflow.com/questions/65838513/optimizing-pd-read-sql

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:32:33+0000

You can pass a chunksize param to the read_sql function (docs), which turns it into a generator that returns an iterator of dataframes with the specified number of rows.

df_iter = pd.read_sql(sql_query,conn1, chunksize=100)

for df in df_iter:
    for row in df: # 100 rows in each dataframe in this example
        # do work here

Generators are an efficient way of processing data that's too large to all fit in memory at once.

Categories

python - Optimizing pd.read_sql

python - Optimizing pd.read_sql

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags