Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
126 views
in Technique[技术] by (71.8m points)

python - Why does a Where clause cause more data to be returned?

I have a script that uses shareplum to get items from a very large and growing SharePoint (SP) list. Because of the size, I encountered the dreaded 5000 item limit set in SP. To get around that, I tried to page the data based on the 'ID' with a Where clause on the query.

# this is wrapped in a while.
# the idx is updated to the latest max if the results aren't empty.
df = pd.DataFrame(columns=cols)
idx = 0
query = {'Where': [('Gt', 'ID', str(idx))], 'OrderBy': ['ID']}
data = sp_list.GetListItems(view, query=query, row_limit=4750)
df = df.append(pd.DataFrame(data[0:]))

That seemed to work but, after I added the Where, it started returning rows not visible on the SP web list. For example, the minimum ID on the web is, say, 500 while shareplum returns rows starting at 1. It also seems to be pulling in rows that are filtered out on the web. For example, it includes column values not included on the web. If the Where is removed, it brings back the exact list viewed on the web.

What is it that I'm getting wrong here? I'm brand new to shareplum; I looked at the docs but they don't go into much detail and all the examples are rather trivial.

Why does a Where clause cause more data to be returned?

question from:https://stackoverflow.com/questions/65851515/why-does-a-where-clause-cause-more-data-to-be-returned

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

After further investigation, it seems shareplum will ignore any filters applied to the list to create the view when a query is provided to GetListItems. This is easily verified by removing the query param.

As a workaround, I'm now paging 'All Items' with a row_limit and query as below. This at least lets me get all the data and do any further filtering/grouping in python.

df = pd.DataFrame(columns=cols)
idx = 0
more = True
while more:
    query = {'Where': [('Gt', 'ID', str(idx))]}
    # Page 'All Items' based on 'ID' > idx
    data = sp_list.GetListItems('All Items', query=query, row_limit=4500)
    data_df = pd.DataFrame(data[0:])
    if not data_df.empty:
        df = df.append(data_df)
        ids = pd.to_numeric(data_df['ID'])
        idx = ids.max()
    else:
        more = False

As to why shareplum behaves this way is still an open question.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...