Paging and Versioning Using ROW_NUMBER()

  • Lawrence,

    Itzik Ben-Gan has an article in the latest edition (June 2010) of SQL Server Maganzine that is about indexing for Row_number/Over/Partition by

    So he thinks that “indexing guidelines are to have the ranking ordering columns in the index key list, either in specified order or exactly reversed, plus include the rest of the columns from the query in the INCLUDE clause”

    So you should probably create an index on hfv_hedge_rel_id, reporting_date

    HIH

    Henrik Staun Poulsen

    http://www.stovi.com

  • Henrik,

    Many thanks for your response.

    I've played around with indexing strategies with my "Latest Edition" example using the dbo.Books table, and nothing is really helping.

    The strange thing is that from an IO point of view, the ROW_NUMBER() solution does show a smaller scan and logical read count. However, the CPU time is very high.

    For example:

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;WITH BookCTE (RowNumber, BookTitle, BookEdition, BookPublishDate, BookAuthor)

    AS

    (

    SELECT

    ROW_NUMBER()OVER (PARTITION BY BookTitle ORDER BY BookEdition DESC),

    BookTitle, BookEdition, BookPublishDate, BookAuthor

    FROM dbo.Books

    )

    SELECT BookTitle, BookEdition, BookPublishDate, BookAuthor

    FROM BookCTE

    WHERE RowNumber=1

    ORDER BY BookTitle

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 28 ms.

    (10 row(s) affected)

    Table 'Books'. Scan count 1, logical reads 10132, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 18297 ms, elapsed time = 18295 ms.

    select b.BookTitle, b.BookEdition, b.BookPublishDate, b.BookAuthor

    from books b

    inner join

    (select booktitle, max(bookedition) as bookedition from books group by booktitle)q

    on b.booktitle = q.booktitle and b.bookedition = q.bookedition;

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    (10 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Books'. Scan count 2, logical reads 20264, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 3609 ms, elapsed time = 3741 ms.

    I think I may need Itzik to take a look at this. 😉

    Will carry on with research, as time allows.

    Regards,

    Lawrence.

  • All,

    Further to my last post: it's worth pointing out that following Blah's example, I have increased the number of rows in the dbo.Books table considerably to test this.

    I used the following code snippet:

    --Add additional data (run multiple times to get 500000 rows)

    INSERT dbo.Books( BookTitle ,BookEdition ,BookPublishDate ,BookAuthor)

    SELECT BookTitle, BookEdition, BookPublishDate, BookAuthor

    FROM dbo.Books

    GO 16

    --Set BookEdition to BookId for better data profile

    UPDATE dbo.Books SET BookEdition=BookId

    NB: For the UPDATE to work, the BookEdition column datatype must be increased from the original SMALLINT datatype to BIGINT for example.

    Thanks,

    Lawrence

  • All,

    With Henrik's input, I've been able to deduce that the following Index improves the "Latest Edition" query considerably:

    CREATE UNIQUE INDEX ix_Books_Test1 ON dbo.Books(BookTitle, BookEdition DESC) INCLUDE (BookPublishDate, BookAuthor)

    This is an ideal index created to serve this query only, and may not help others.

    Regards,

    Lawrence.

  • Lawrence: I wanted to say that I appreciate both the original article and all your follow up work. I had read elsewhere about performance problems with the functions like row_number, but I didn't know what could be done to help alleviate the problem. Row_number is just too darn useful to want to give up. Nice that there is something that can be done to help with performance. Thanks.

  • JJ B,

    Many thanks for your post and kind comments.

    It's certainly been a valuable process for me to discover the performance limitations of ROW_NUMBER(), and be able to demonstrate that they can be overcome with appropriate indexing strategies.

    However, it's clear that any ROW_NUMBER() based solution across large amounts of data does have some gotchas. Should the "ideal" index(es) not be available, or can't be created due to environmental constraints, other solutions are faster.

    As always, it's about understanding your data and the queries that are run across it.

    Kind regards,

    Lawrence

  • Mr. Lawrence Moore,

    Then according to this algorithm,

    Is will be possible to predict how many pages will be in the Query?

  • Jesus,

    I'm not completely sure what you are asking.

    Could you elaborate please?

    Many thanks,

    Lawrence

  • Good one.. 🙂

  • some solutions become simple and convenient since RANK function(NOW_NUMBER,RANK and DENSE_RANK).

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply