#StackBounty: #sharepoint-online #csom Is there a way to get all paging info for a list up front so that very large lists can be fetche…

Bounty: 50

I am using SharePoint online and SharePoint 2013+ onprem.

I have written a CSOM rest program that fetches all the items in a list and fetches list items in 5000 item batches, the maximum allowed by the list item view threshold.

Here is an example of the XML payload of the CSOM request to get one of the pages.

<Request AddExpandoFieldTypeSuffix="true" SchemaVersion="14.0.0.0" LibraryVersion="16.0.0.0"
         ApplicationName=".NET Library" xmlns="http://schemas.microsoft.com/sharepoint/clientquery/2009">
    <Actions>
        <ObjectPath Id="10019" ObjectPathId="10018"/>
        <Query Id="10020" ObjectPathId="10018">
            <Query SelectAllProperties="true">
                <Properties/>
            </Query>
            <ChildItemQuery SelectAllProperties="true">
                <Properties>
                    <Property Name="Id" ScalarProperty="true"/>
                    <Property Name="DisplayName" ScalarProperty="true"/>
                    <Property Name="HasUniqueRoleAssignments" ScalarProperty="true"/>
                    <Property Name="File" SelectAll="true">
                        <Query SelectAllProperties="false">
                            <Properties/>
                        </Query>
                    </Property>
                    <Property Name="ContentType" SelectAll="false">
                        <Query SelectAllProperties="false">
                            <Properties>
                                <Property Name="Name" ScalarProperty="true"/>
                            </Properties>
                        </Query>
                    </Property>
                </Properties>
            </ChildItemQuery>
        </Query>
    </Actions>
    <ObjectPaths>
        <Method Id="10018" ParentId="9" Name="GetItems">
            <Parameters>
                <Parameter TypeId="{3d248d7b-fc86-40a3-aa97-02a75d69fb8a}">
                    <Property Name="AllowIncrementalResults" Type="Boolean">false</Property>
                    <Property Name="DatesInUtc" Type="Boolean">true</Property>
                    <Property Name="FolderServerRelativePath" Type="Null"/>
                    <Property Name="FolderServerRelativeUrl" Type="Null"/>
                    <Property Name="ListItemCollectionPosition" TypeId="{922354eb-c56a-4d88-ad59-67496854efe1}">
                        <Property Name="PagingInfo" Type="String">Paged=TRUE&amp;p_ID=10000</Property>
                    </Property>
                    <Property Name="ViewXml" Type="String">&lt;View Scope="RecursiveAll"&gt;&lt;RowLimit&gt;5000&lt;/RowLimit&gt;&lt;/View&gt;</Property>
                </Parameter>
            </Parameters>
        </Method>
        <Identity Id="9"
                  Name="740c6a0b-85e2-48a0-a494-e0f1759d4aa7:site:2386a403-8d76-4737-b774-dabad52201e3:web:7a2f544f-e3ed-444e-8de3-178c2c9b5848:list:63029d5e-0588-4406-89f4-9b5a69ae5d81"/>
    </ObjectPaths>
</Request>

Notice the Paged=TRUE&amp;p_ID=10000 is specified which means get me the next page of results starting at p_ID 10000.

So the simple algorithm to fetch all the items is this:

next_page_id = 0
while (next_page_id is not null)
   get next page of list items starting at next_page_id.
   if there is another page
      set next_page_id to that page id.
   else
      set next_page_id to null.

But the main flaw with this is that now a list with 1,000,000 items can only be fetched with a single thread. Which means that this can potentially fetch for hours and hours if a single list has millions of records in it.

To get past this, I tried to use the List.ItemCount to create a blocking collection of pages that we need to fetch. For example, if you have 150000 records in a list, create Paged=TRUE&amp;p_ID=0, Paged=TRUE&amp;p_ID=5000, and Paged=TRUE&amp;p_ID=10000 and a thread pool can grab pages to fetch until there is no more work to do.

But this isn’t correct. You will end up with incomplete lists, because the p_ID is not the “index of result.” It has some other meaning that I do fully understand.

Is there a way to do what I am trying to do?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.