Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Paging and Versioning Using ROW_NUMBER() Expand / Collapse
Author
Message
Posted Tuesday, June 15, 2010 9:10 AM


Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, March 21, 2011 9:46 AM
Points: 21, Visits: 17
regarding CTE's; very powerful tool for segmentation, but you do have to be careful as the engine will run the CTE query each time it's referenced. I would think, but haven't explicitly observed, that the db engine would somehow optimize this behavior, since all the reads are happening within the same transaction space it ought to be quite impossible to get a different result when running a CTE for multiple inclusion.

I've written some monsterous inline table valued functions that string 5 to 8 CTE's together, recalling the earlier CTE's in later segments; feeding prequalified data to work queries and paging CTEs and so on, only to find the query plan and overhead become monsterous. The showplan doesn't seem to indicate that there are any savings for multiple references to a CTE; question for Microsoft I suppose.

I eventually stepped back from the giant CTE approach and started using table valued variables; it's a different type of IO problem, but seemed to be a more effecient solution vs the monster CTE query.

When it comes to CTEs and the finding the best balance between convenience and performance you really have to try variations and compare results in the execution plan; as with so many sql server optimization topics; 'it depends'.




now, on to row numbering and paging; wanted to chip in my 2 cents worth.

One trick I've used to get a very tight grip on paging is to introduce a second ROW_NUMBER going in the opposite direction as the first, then summing the two to get a row count before page selection; it does introduce some additional overhead and it can be significant, but it the benefit outweighs the cost it can be quite useful. The version below uses a 0 based row and page indexes; first page is 0, first row is 0.

Note: removing the RN row number will significantly reduce overhead while continuing to allow you to use the page index functionality; you loose the row count and page count, but can still pull back a specific page in the sequence, accomplishing something like the LIMIT function mySql.

DECLARE @PageLimit INT, @PageIndex INT;
SELECT @PageLimit=20, @PageIndex=0;


SELECT
[RowIndex]=[IX]
,[RowCount]=[IX] + [RN]
,[PageCount]=CIELING(1.0 * [IX] + [RN] / @PageLimit)
,[PageIndex]=FLOOR(1.0 * [IX] / @PageLimit)
...
FROM (
SELECT
[IX]=ROW_NUMBER() OVER(ORDER BY [foo] ASC)-1
,[RN]=ROW_NUMBER() OVER(ORDER BY [foo] DESC)
...
) PageBase
WHERE FLOOR(1.0 * [IX] / @PageLimit) = @PageIndex;

Post #937597
Posted Tuesday, June 15, 2010 10:03 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, June 19, 2012 8:06 AM
Points: 218, Visits: 10,576
CTEs are generally poor performing. We have had to rewrite several for performance reasons. Your example breaks down when you have a table with a lot of rows and takes over an order of magnitude longer to execute. You should not be doing any performance evaluation a such a small table. The CTE is the worst performing of the 3 methods I know of. You do not mention the third which is joining a subquery back to the table.

select b.BookTitle, b.BookEdition, b.BookPublishDate, b.BookAuthor
from books b
inner join
(select booktitle, max(bookedition) as bookedition from books group by booktitle)q
on b.booktitle = q.booktitle and b.bookedition = q.bookedition;

I reinserted your data back into the table 17 times and updated the edition with the identity column:


Derived table:
Table 'Books'. Scan count 17, logical reads 15461
390 Milliseconds

Correlated subquery:
Table 'Books'. Scan count 17, logical reads 15461
400 Milliseconds

CTE:
Table 'Books'. Scan count 17, logical reads 21203
8103 Milliseconds!!!!!!!!!!

Post #937655
Posted Tuesday, June 15, 2010 10:37 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, September 27, 2012 2:02 AM
Points: 33, Visits: 53
You can avoid CTE by doing this to incorporate the row_number function in the where clause

select * from
(select ROW_NUMBER()OVER (ORDER BY column_order_by) as RowNbr,col_1,col_2
from inner_table with(nolock)) as outer_table
where RowNbr < 5 --> use row number in the where clause
Post #937682
Posted Tuesday, June 15, 2010 10:41 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Yesterday @ 8:31 AM
Points: 60, Visits: 308
Thanks for your post Blah Baby.

It seems that the Row_Number() clause is the cause of the slow running, rather than the CTE itself.

(Worrying given the subject of my article was the former, and not the latter. :-( )

For example, the following is fast:

;WITH BookCTE (BookTitle, BookEdition)
AS
(
select booktitle, max(bookedition) as bookedition from books group by booktitle
)
SELECT b.BookTitle, b.BookEdition, b.BookPublishDate, b.BookAuthor
FROM dbo.Books b INNER JOIN BookCTE c ON b.BookTitle=c.BookTitle AND b.BookEdition=c.BookEdition


The issue of why ROW_NUMBER() is signifcantly slower as data volumes increase in this case, is counter-intuitive to me, and needs additional research.

Thanks again.
Lawrence.
Post #937685
Posted Tuesday, June 15, 2010 3:18 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, June 17, 2010 12:31 AM
Points: 33, Visits: 9
Lawrence,

Thank you for your answer.
I will have a look at the links you gave me.

Kind regards,

Martin
Post #937853
Posted Wednesday, June 16, 2010 4:59 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 4:06 AM
Points: 1,418, Visits: 1,019
Lawrence,

Itzik Ben-Gan has an article in the latest edition (June 2010) of SQL Server Maganzine that is about indexing for Row_number/Over/Partition by

So he thinks that “indexing guidelines are to have the ranking ordering columns in the index key list, either in specified order or exactly reversed, plus include the rest of the columns from the query in the INCLUDE clause”

So you should probably create an index on hfv_hedge_rel_id, reporting_date

HIH
Henrik Staun Poulsen
www.stovi.com



Post #938049
Posted Wednesday, June 16, 2010 6:24 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Yesterday @ 8:31 AM
Points: 60, Visits: 308
Henrik,

Many thanks for your response.

I've played around with indexing strategies with my "Latest Edition" example using the dbo.Books table, and nothing is really helping.

The strange thing is that from an IO point of view, the ROW_NUMBER() solution does show a smaller scan and logical read count. However, the CPU time is very high.

For example:

SET STATISTICS IO ON
SET STATISTICS TIME ON

;WITH BookCTE (RowNumber, BookTitle, BookEdition, BookPublishDate, BookAuthor)
AS
(
SELECT
ROW_NUMBER()OVER (PARTITION BY BookTitle ORDER BY BookEdition DESC),
BookTitle, BookEdition, BookPublishDate, BookAuthor
FROM dbo.Books
)
SELECT BookTitle, BookEdition, BookPublishDate, BookAuthor
FROM BookCTE
WHERE RowNumber=1
ORDER BY BookTitle

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 28 ms.

(10 row(s) affected)
Table 'Books'. Scan count 1, logical reads 10132, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 18297 ms, elapsed time = 18295 ms.


select b.BookTitle, b.BookEdition, b.BookPublishDate, b.BookAuthor
from books b
inner join
(select booktitle, max(bookedition) as bookedition from books group by booktitle)q
on b.booktitle = q.booktitle and b.bookedition = q.bookedition;

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.

(10 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Books'. Scan count 2, logical reads 20264, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 3609 ms, elapsed time = 3741 ms.

I think I may need Itzik to take a look at this.

Will carry on with research, as time allows.

Regards,
Lawrence.
Post #938098
Posted Wednesday, June 16, 2010 6:27 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Yesterday @ 8:31 AM
Points: 60, Visits: 308
All,

Further to my last post: it's worth pointing out that following Blah's example, I have increased the number of rows in the dbo.Books table considerably to test this.

I used the following code snippet:

--Add additional data (run multiple times to get 500000 rows)
INSERT dbo.Books( BookTitle ,BookEdition ,BookPublishDate ,BookAuthor)
SELECT BookTitle, BookEdition, BookPublishDate, BookAuthor
FROM dbo.Books
GO 16

--Set BookEdition to BookId for better data profile
UPDATE dbo.Books SET BookEdition=BookId


NB: For the UPDATE to work, the BookEdition column datatype must be increased from the original SMALLINT datatype to BIGINT for example.

Thanks,
Lawrence
Post #938103
Posted Wednesday, June 16, 2010 6:47 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Yesterday @ 8:31 AM
Points: 60, Visits: 308
All,

With Henrik's input, I've been able to deduce that the following Index improves the "Latest Edition" query considerably:

CREATE UNIQUE INDEX ix_Books_Test1 ON dbo.Books(BookTitle, BookEdition DESC) INCLUDE (BookPublishDate, BookAuthor)

This is an ideal index created to serve this query only, and may not help others.

Regards,
Lawrence.
Post #938123
Posted Wednesday, June 16, 2010 8:45 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, November 14, 2014 10:29 AM
Points: 266, Visits: 2,610
Lawrence: I wanted to say that I appreciate both the original article and all your follow up work. I had read elsewhere about performance problems with the functions like row_number, but I didn't know what could be done to help alleviate the problem. Row_number is just too darn useful to want to give up. Nice that there is something that can be done to help with performance. Thanks.
Post #938272
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse