Returning row count as output in CTE

  • I'm doing something similar to the below example:

    DECLARE @RowStart INT

    DECLARE @RowEnd INT

    SET @RowStart = 1

    SET @RowEnd = 100

    ;WITH CTE AS

    (

    SELECT a,

    b,

    c,

    ROW_NUMBER() OVER(ORDER BY c) AS RowNumber

    FROM table1

    )

    SELECT a,

    b,

    c

    FROM CTE

    WHERE RowNumber BETWEEN @RowStart AND @RowEnd

    I had this working fine as a stored procedure until I was told I needed to output the total row count, or the number of records that would have been returned with no paging. I searched and searched and could not find a solution that worked. I'm not able to set the the total row count and return the data from the select statement when using a CTE. My solution to this was to dump the CTE idea and instead use a temp table to get a row count and then return the data with paging, similar to what's below.

    SELECT a,

    b,

    c,

    ROW_NUMBER() OVER(ORDER BY c) AS RowNumber

    INTO #table1

    SET @TotalRows = COUNT(*)

    FROM #table1

    SELECT a,

    b,

    c

    FROM #table1

    WHERE RowNumber BETWEEN @RowStart AND @RowEnd

    It works the way I want it to, it just doesn't perform that great. I would like to get it working with a CTE and then compare execution plans and IO and go with the one that performs better. If I could get it working as a CTE or some other alternative solution that I'm missing that would be great.

    Thanks for any and all help.

  • i just sat in on a SQLSaturday #79 in Davie(Fort Lauderdale) presentation on how to improve performance in exactly this situation;

    the Presenter was Dimitri, and he had a great example how he added indexes and CTE's untill the join of 20 primariy keys from the row number CTE was joined ont eh bigger(millionrowtable?) in order to get the best performance.

    i looked on the sql saturday 79 page, but didn't find links to his presentation, nore in the schedule...i think his presentation was a late add on.

    http://www.sqlsaturday.com/79/eventhome.aspx

    here's an example using sys.objects, which i hope is understandable. the idea is cross join the total, and use a CTE to limit the rows to just stuff that exists in an index, before finally joining to get the rest of the data, so you get , say just 100 rows, isntead of a zillion rows that get limited to 100

    --declare our variables

    DECLARE @RowStart INT

    DECLARE @RowEnd INT

    SET @RowStart = 1

    SET @RowEnd = 100

    --get the total count

    WITH TotalCount AS

    (

    SELECT COUNT(*) AS TotalCount FROM sys.objects objz

    ),

    --for performance reasons, get the PK and the x number of rows we want to get.

    --the ORDER BY is important: if we are Ordering by a differnet column, make sure there is an

    --index that has the PK and the Ordering column

    AllRows AS

    (

    SELECT

    objz.OBJECT_ID,

    ROW_NUMBER() OVER (ORDER BY objz.name ) AS RW

    FROM sys.objects objz

    )

    --i think Dimitri had an additional CTE that just selected the rows in the row number between start and end...here i just grab them all

    SELECT

    AllRows.*,

    objz.*,

    TotalCount.*

    FROM AllRows

    INNER JOIN sys.objects objz

    CROSS JOIN TotalCount

    ON Allrows.object_id = objz.object_id

    WHERE AllRows.RW BETWEEN @RowStart AND @RowEnd

    the key is looking at the execution plan, so you can see the final join is just 100 rows, isntead of a lot more

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Interesting solution. I'll have to try and wrap my head around that one. Not seeing how I would return the total count as an output parameter though? Wish I could see that presentation too, maybe they'll put it up later.

  • Is there any problem with just using @@ROWCOUNT ?

    DECLARE @RowStart INT

    DECLARE @RowEnd INT

    SET @RowStart = 1

    SET @RowEnd = 100

    ;WITH CTE AS

    (

    SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowNumber

    FROM sys.objects

    )

    SELECT * FROM CTE

    WHERE RowNumber BETWEEN @RowStart AND @RowEnd

    SELECT @@ROWCOUNT

    --edit--

    Obviously, use Lowell's solution for the rest of the query - if it adds any performance gains. I haven't seen it before which is why I haven't used it in the example above - want to get my head around it first!.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Cadavre (8/16/2011)


    Is there any problem with just using @@ROWCOUNT ?

    DECLARE @RowStart INT

    DECLARE @RowEnd INT

    SET @RowStart = 1

    SET @RowEnd = 100

    ;WITH CTE AS

    (

    SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowNumber

    FROM sys.objects

    )

    SELECT * FROM CTE

    WHERE RowNumber BETWEEN @RowStart AND @RowEnd

    SELECT @@ROWCOUNT

    --edit--

    Obviously, use Lowell's solution for the rest of the query - if it adds any performance gains. I haven't seen it before which is why I haven't used it in the example above - want to get my head around it first!.

    Just a small problem with using @@rowcount here...

    It will not return what was required (total number of records before pagiong) but the number of records in the page.

    In your example, if database have more than 50 user defined objects, instead of SELECT @@ROWCOUNT you could just do SELECT 100 :-D.

    I would bodify offered solution to count total number of rows out of CTE, just to safe some code lines πŸ˜€

    --declare our variables

    DECLARE @RowStart INT

    DECLARE @RowEnd INT

    SET @RowStart = 1

    SET @RowEnd = 100

    --get the records

    ;WITH Recs AS

    (

    SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN -- or whatever requried

    FROM sys.objects

    )

    --Get the total OUT of full resultset

    ,Cnt AS ( SELECT COUNT(*) TtlCnt FROM Recs )

    SELECT R.*, C.TtlCnt

    FROM Recs R CROSS JOIN Cnt C

    WHERE R.RN BETWEEN @RowStart AND @RowEnd

    Also, I'm not sure if selecting just ID in CTE and joining it back would anyhow increase performance... Actually it's opposit! You will end up with more reads, as CTE is just a stylish wrapper for subquery. So, if you select ID's then join back to sys.objects SQL Server will perform much more reads then in my example. (~600 vs ~200).

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene has it right, it returns the number of records that were returned after paging. I need to know how many would have been returned had there been no paging.

    I know you were just joking but I hardcoded those values (1 and 100), the calling application will send me a page number and a page size and then I have to calculate how many results to return. It's for a web search app so the user knows they are seeing results 1-100 of 1000, or 201-300 of 1000, or 1-5 of 10 etc. It really could be anything...just a background on the query in question.

  • Eugene, so the count is returned as a column in the resultset? I already tried something similar and it wasn't an accepted solution...:(

    I never understood why not, but I'm not a web developer, so they may have their reasons.

  • Eugene Elutin (8/16/2011)


    Just a small problem with using @@rowcount here...

    It will not return what was required (total number of records before pagiong) but the number of records in the page.

    Ah, misread the OP.

    I guess to fit the multiple result-sets you'd need a temp table then.

    bwoulfe (8/16/2011)


    Eugene, so the count is returned as a column in the resultset? I already tried something similar and it wasn't an accepted solution...:(

    I never understood why not, but I'm not a web developer, so they may have their reasons.

    I had a .Net developer say the same to me awhile ago (although the requirements were slightly different - instead he wanted the whole result-set returned but wanted to know the size of the result-set). After we discussed it for awhile, we added a WCF component as a middle-man that obtained the entire result-set including a separate column that contained the number of records returned. The WCF component then passed this data to the front-end application before parcelling out the result-set in the way it was required.

    Eugene Elutin (8/16/2011)


    Also, I'm not sure if selecting just ID in CTE and joining it back would anyhow increase performance... Actually it's opposit! You will end up with more reads, as CTE is just a stylish wrapper for subquery. So, if you select ID's then join back to sys.objects SQL Server will perform much more reads then in my example. (~600 vs ~200).

    I thought that would be the case, but haven't had time to test Lowell's code. Hoping that the presentation gets posted online somewhere, would be interested to see how it worked!


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • How you want your count to be returned then?

    Returning the count in the column would be the best option, client app can read this value from first row and then ignore this column completely.

    You can return it as output parameter of stored proc or in a separate resultset. But then I whould advice to use temp table (example attached).

    What you cannot do is: to return resultset of 100 rows but tell the client app that its recordscount is different πŸ˜€

    --declare our variables

    DECLARE @RowStart INT

    DECLARE @RowEnd INT

    SET @RowStart = 1

    SET @RowEnd = 100

    --get the records

    SELECT [object_id], ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN -- or whatever requried

    INTO #Recs

    FROM sys.objects

    SELECT R.*

    FROM Recs R

    WHERE R.RN BETWEEN @RowStart AND @RowEnd

    -- Option 1. return total count as stored proc return parameter

    -- (you can do it via output parameter as well)

    DECLARE @r INT

    SELECT @r = COUNT(*) FROM #Recs

    RETURN @r

    -- Option 2. return total count in a separte resultset

    SELECT COUNT(*) AS TotalRecCount FROM #Recs

    And Yes, using temp table in this case is fine from performance perspective...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • They want it returned with an OUTPUT parameter from the stored procedure. It's setup with temp table right now and the performance is just ok. I may just need to stick with a temp table but rewrite it. The stored procedure is over 1000 lines of code so I've been dreading doing that.

    Thanks for the responses everyone, I think it's back to the drawing board though. I'm going to try the count returned in a column solution with them again, see if that might work this time....:-P

  • bwoulfe (8/16/2011)


    ...It's setup with temp table right now and the performance is just ok. I may just need to stick with a temp table but rewrite it...

    Using temp tables is fine and may be even more beneficial then CTE sometimes.

    Just few tips for rewritting:

    1. Use SELECT ... INTO to create and populate temp table

    2. Consider creating appropriate indexes on temp tables if they used multiple times in joins.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (8/16/2011)


    bwoulfe (8/16/2011)


    ...It's setup with temp table right now and the performance is just ok. I may just need to stick with a temp table but rewrite it...

    Using temp tables is fine and may be even more beneficial then CTE sometimes.

    Just few tips for rewritting:

    1. Use SELECT ... INTO to create and populate temp table

    2. Consider creating appropriate indexes on temp tables if they used multiple times in joins.

    I'm using INSERT INTO, just so I can see my column names and reference them later if a problem occurs (this is still in dev). Any reason SELECT...INTO is preferred?

    I do all the inserts, then add my indexes, then return the results. There are about 4 or 5 inserts into the temp table though, maybe reordering those steps might help?

  • SELECT Col1 AS Column1

    ,Col2 AS Column2

    .....

    INTO #t

    You can see that you can name your columns quite easely. You can also define required datatype by casting the value to it (eg. SELECT CAST(SomeInt AS VARCHAR(11)) AS ...)

    Benefits in performance...

    You can read quite a lot about it πŸ˜‰

    Just few links you may find interesting:

    http://stackoverflow.com/questions/1302670/sql-server-select-into-and-blocking-with-temp-tables

    http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

    Basically it's one of the "bulk load" methods within SQL Server so gives you the fasterst possible insert speed in T-SQL.

    If data you inserting is naturally distinct you can insert all of your data at once using UNION ALL statement, something like:

    SELECT q1.col1, q1.Col2, ...

    INTO #t

    FROM q1

    UNION ALL

    SELECT q2.col1, q2.Col2, ...

    FROM q2

    UNION ALL

    SELECT q3.col1, q3.Col2, ...

    FROM q3

    ....

    If needs to be deduplicated any way you may test if using UNION alone will be good enough.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Lowell (8/15/2011)


    i just sat in on a SQLSaturday #79 in Davie(Fort Lauderdale) presentation on how to improve performance in exactly this situation;

    the Presenter was Dimitri, and he had a great example how he added indexes and CTE's untill the join of 20 primariy keys from the row number CTE was joined ont eh bigger(millionrowtable?) in order to get the best performance.

    Was it the key seek method, explained elegantly by Paul White here[/url]?

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/16/2011)


    Was it the key seek method, explained elegantly by Paul White here[/url]?

    that post from Paul exactly encapsulates that portion of his presentation Chris, yes.

    Β» Dmitri Korotkevitch's web site, now that i grabbed my notes, is http://aboutsqlserver.com but he has not posted his presentation materials from it yet.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply