LINQ to SQL Paging Gotcha

Framework Version: 3.5

I ran into an issue with how LINQ to SQL implements the Skip and Take operators which leads to records being repeated or missing when performing custom paging.

I am using the Northwind database. My UI will display the City and ContactName of all Customers five records at a time. The LINQ expression projects to an anonymous type with properties of City and ContactName. We see below the LINQ expression, the SQL generated and the data returned for the first and next 5 records.

First 5 Records (Page 1)   Next 5 Records (Page 2)
LINQ Expression
(from c in Customers
        select new {
            c.City,
            c.ContactName
        }).Skip(0).Take(5)
        LINQ Expression
(from c in Customers
        select new {
            c.City,
            c.ContactName
        }).Skip(5).Take(5)
Generated SQL
SELECT TOP (5) [t0].[City], [t0].[ContactName]
FROM [Customers] AS [t0]
  Generated SQL
DECLARE @p0 Int = 5
DECLARE @p1 Int = 5

SELECT [t1].[City], [t1].[ContactName]
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY [t0].[City], [t0].[ContactName]) AS [ROW_NUMBER], 
    [t0].[City], [t0].[ContactName]
    FROM [Customers] AS [t0]
    ) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
ORDER BY [t1].[ROW_NUMBER]
Result set
City            ContactName
--------------- ------------------------------
Berlin          Maria Anders
México D.F.     Ana Trujillo
México D.F.     Antonio Moreno
London          Thomas Hardy
Luleå           Christina Berglund
  Result set
City            ContactName
--------------- ------------------------------
Barquisimeto    Carlos González
Bergamo         Giovanni Rovelli
Berlin          Maria Anders
Bern            Yang Wang
Boise           Jose Pavarotti

From the result set, we see that Maria Anders is repeated in Page 1 and Page 2!  The same behavior can also be observed with a LinqDataSource as defined below:

<asp:GridView ID="GridView1" runat="server" AllowPaging="True" AutoGenerateColumns="False"
    DataSourceID="LinqDataSource1" PageSize="5">
    <Columns>
        <asp:BoundField DataField="City" HeaderText="City" ReadOnly="True" SortExpression="City" />
        <asp:BoundField DataField="ContactName" HeaderText="ContactName" ReadOnly="True"
            SortExpression="ContactName" />
    </Columns>
</asp:GridView>
<asp:LinqDataSource ID="LinqDataSource1" runat="server" ContextTypeName="DataClassesDataContext"
     Select="new (City, ContactName)" TableName="Customers">
</asp:LinqDataSource>


The reason for this behavior is because the SQL generated for Page 1 does not have an order specified which results in the records being returned in an unknown order

SELECT TOP (5) [t0].[City], [t0].[ContactName]
FROM [Customers] AS [t0]

while for subsequent pages, the SQL generated has an order by clause (required for the ROW_NUMBER() function).

(ORDER BY [t0].[City], [t0].[ContactName]) 

While paging through the result set, this will either result in repeating records or missing records*. 

The best way to work around this issue in framework version 3.5 is to always specify an orderby in the expression before the Take and Skip operators. By doing this, you guarantee that the order used for the first page (with the TOP operator) will be the same as the order used for subsequent pages (using the ROW_NUMBER() OVER ORDER function)

(from c in Customers
 orderby c.ContactName
        select new {
            c.City,
            c.ContactName
        }).Skip(0).Take(5)
SELECT TOP (5) [t0].[City], [t0].[ContactName]
FROM [Customers] AS [t0]
ORDER BY [t0].[ContactName]

LINQ To Entities on the other hand forces you to specify an order by in a scenario like this. If you don’t you get the following error:

The method 'Skip' is only supported for sorted input in LINQ to Entities.
The method 'OrderBy' must be called before the method 'Skip'.

This issue seems to have been fixed in the .NET 4.0 framework where the SELECT TOP statement is no longer generated by LINQ TO SQL for the first page.

(from c in Customers
 orderby c.ContactName
        select new {
            c.City,
            c.ContactName
        }).Skip(0).Take(5)
DECLARE @p0 Int = 0
DECLARE @p1 Int = 5
 
SELECT [t1].[City], [t1].[ContactName]
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ContactName]) AS [ROW_NUMBER], [t0].[City], [t0].[ContactName]
    FROM [Customers] AS [t0]
    ) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
ORDER BY [t1].[ROW_NUMBER]

In addition, if you don’t specify an orderby clause or project your result, LINQ To SQL will use all the properties specified in the entity class in the order by clause and select statement leading to ugly and inefficient code:

(from c in Customers
        select c).Skip(10).Take(5)
SELECT [t1].[CustomerID], [t1].[CompanyName], [t1].[ContactName], [t1].[ContactTitle], [t1].[Address], 
[t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[Phone], [t1].[Fax]
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], 
[t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], 
[t0].[Phone], [t0].[Fax]) AS [ROW_NUMBER], [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], 
[t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
    FROM [Customers] AS [t0]
    ) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
ORDER BY [t1].[ROW_NUMBER]



Special thanks to Joseph Albahari for creating LINQPad. 

*  In most cases we get lucky and get the records from SQL based on the PK for the first page (even though as mentioned before, order is not guaranteed) and since our LINQ Expression will most likely have the PK as the first item in the projection it results in the same order for both the first page and subsequent pages. Hence this obscure issue.

4 Comments

Comments have been disabled for this content.