OFFSET/FETCH with covering Nonclustered Index on partitioned table

  • Morning all,

    Having a look at OFFSET/FETCH and how it performs for pagination, and coming across some behaviour that I wouldn't expect. I have a partitioned table with a sequential ID (EventLogID) and a EventID column that's fairly randomly distributed. We have a nonclustered index on the EventID column, and this is where things are not running as I might expect: when I write an OFFSET/FETCH query ordered by EventID, it's performing an Index Scan and sort on the whole NCI, rather than the number of rows specified in the offset and fetch, even though that index is already sorted in that order.

    Code to set everything up, with number of rows reduced to a million for the moment (the real table has some 80 million rows in, but the behaviour appears to be the same regardless of the rowcount):-

    /* drop object if it already exists */

    IF EXISTS (SELECT 1

    FROM sys.objects AS ob

    INNER JOIN sys.schemas AS sc

    ON ob.schema_id = sc.schema_id

    WHERE ob.name = 'EventLog'

    AND sc.name = 'Demo')

    DROP TABLE Demo.EventLog;

    IF EXISTS (SELECT 1 FROM sys.partition_schemes AS PS

    WHERE name = 'psEventLogBands')

    DROP PARTITION SCHEME psEventLogBands;

    IF EXISTS (SELECT 1 FROM sys.partition_functions AS PF

    WHERE name = 'pfEventLogBands')

    DROP PARTITION FUNCTION pfEventLogBands;

    IF EXISTS (SELECT 1 FROM sys.schemas AS S

    WHERE name = 'Demo')

    SET NOEXEC ON;

    GO

    CREATE SCHEMA Demo AUTHORIZATION dbo;

    GO

    SET NOEXEC OFF;

    GO

    CREATE PARTITION FUNCTION [pfEventLogBands](int) AS RANGE LEFT FOR VALUES (0, 200000, 400000, 600000, 800000, 1000000);

    CREATE PARTITION SCHEME [psEventLogBands] AS PARTITION [pfEventLogBands] ALL TO (PRIMARY);

    CREATE TABLE [Demo].[EventLog](

    [EventLogID] [int] IDENTITY(1,1) NOT NULL,

    [EventID] [tinyint] NOT NULL

    CONSTRAINT [PK_dbo:EventLog:EventID] PRIMARY KEY CLUSTERED

    (

    [EventLogID] ASC

    )WITH (FILLFACTOR = 100) ON psEventLogBands(EventLogID)

    ) WITH (DATA_COMPRESSION = PAGE);

    DECLARE @Nums TABLE

    (Number tinyint NOT NULL)

    INSERT INTO @Nums

    SELECT TOP 50 ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM sys.columns

    INSERT INTO Demo.EventLog WITH (TABLOCK)

    (EventID)

    SELECT TOP 1000000 N.Number

    FROM @Nums AS N

    CROSS JOIN @Nums AS N2

    CROSS JOIN @Nums AS N3

    CROSS JOIN @Nums AS N4

    CREATE INDEX IDX_EventID

    ON demo.EventLog (EventID, EventLogID) WITH (DATA_COMPRESSION = PAGE);

    Using the clustered index, we see that offset fetch is only scanning 10 rows of the table:-

    SET STATISTICS IO ON

    SELECT EL.EventLogID

    ,EL.EventID

    FROM Demo.EventLog AS EL

    ORDER BY EventLogID OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;

    (10 row(s) affected)

    Table 'EventLog'. Scan count 2, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    The statistics IO results confirm this, and the query plan (attached) shows estimated and actual rows as 10.

    However, when I switch the sort to use the EventID column, I end up with a parallel scan on the IDX_EventID index returning all one million rows and then sorting them, before applying the top operator to reduce down to ten rows:-

    SELECT EL.EventLogID

    ,EL.EventID

    FROM Demo.EventLog AS EL

    ORDER BY EventID, EventLogID OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;

    (10 row(s) affected)

    Table 'EventLog'. Scan count 8, logical reads 972, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    As you can see from the statistics IO output, all 972 pages are being read and from the query plan (attached) that the actual and estimated rows are one million, coupled with the sort operator.

    This is clearly a problem arising with the introduction of partitioning into the equation; if the table is built without it, then everything works exactly as expected.

    Is there any way to get this to work on a partitioned table?

    Regards

    Matthew

    Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

  • That's because the non-clustered index was created as partitioned (that is the default behavior when you create an index on a partitioned table).

    With it partitioned on eventlogID, that means it will have to pull rows from every partition (each of which is its own B-Tree) and then sort all those rows to figure out which are the top 10 based on EventID.

    The implementation in this case is a bit disappointing, because from a high-level perspective it could pull just the top 10 rows by EventID from each partition, and sort that much smaller result set. Alas, it doesn't do that.

    You can get the behavior you were expecting by preventing the default partitioning behavior (just specify ON [PRIMARY] for the index creation, and then you'll get a non-partitioned index that will behave as expected).

    Of course, that comes at a cost, as now you have a non-aligned index, which will prevent partition switching.

    Cheers!

  • Jacob Wilkins (4/25/2016)

    The implementation in this case is a bit disappointing, because from a high-level perspective it could pull just the top 10 rows by EventID from each partition, and sort that much smaller result set. Alas, it doesn't do that.

    Shame, that's what I was hoping might be possible. Unfortunately in this case, the partition switch functionality is the driver for the table to be partitioned (it churns through several million rows a week as a horrible implementation of a log), so I wouldn't be able to remove the partition alignment from the index.

    Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

  • Ah, my condolences.

    There's a really old and still active connect item about this limitation.

    You can at least upvote it and hope. 🙂

    https://connect.microsoft.com/SQLServer/fedback/details/240968/partition-table-using-min-max-functions-and-top-n-index-selection-and-performance

    I thought I remembered there being some ugly ways to work around the issue, and the workarounds listed on that connect item confirm that.

    Not sure if they will be of any use to you, but worth a shot.

    Cheers!

  • Thanks for the connect link; upvoted it, for whatever good that will do!

    Fortunately in this case, I was only using the table in question as a demo for using the OFFSET FETCH and I don't actually need to have any code like that running against it!

    Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply