Slow performant select query, yet apparently simple.

  • I'm trying to perform SQL query (used for paging) on a table but it's resulting amazingly slow.

    Why this query takes 1 minutes to complete ?

    SELECT TOP(20) * FROM (

    SELECT *, Row_Number () OVER (ORDER BY ID ASC ) AS IXX

    FROM ContactForms

    ) AS Tbl

    WHERE Tbl.IXX > 30000

    If I run the same query with a where condition slightly higher, say 31000, then it takes almost no time, I think it has something to do with caching, but after a few minutes the query time boosts again.

    ID is a primary key (clustered index). The table indexes were just rebuilt. The table is about 1.2 gigs with 65.000 records ~, the max size of a row is around 120kbs (the sum of 3 nvarchar-max fields) ~

    I also added a non clustered, non unique key on a DateTime field, but the results didn't change performance wise.

    After index rebuilding, the dbcc showcontig ('dbo.ContactForms) returns the following results that seem ok:

    - Pages Scanned 51485 (isn't it a lot ? is it normal ?)

    - Extents Scanned 6436

    - Extent Switches 6435

    - Avg. Pages per Extent 8.0

    - Scan Density [Best Count:Actual Count] 100.00% [6436:6436]

    - Logical Scan Fragmentation 0.01%

    - Extent Scan Fragmentation 5.56%

    - Avg. Bytes Free per Page 2339.0

    - Avg. Page Density (full) 71.10%

    I wonder how can I speed this up. If this is a regular behavior or the table is borked somehow.

    Any suggestion extremely appreciated.

    Sincerely, Agustin Garzon.

  • Capture the actual query plans for both the bad one (30000) and the good one (31000), save them as *.sqlplans, zip them up and post them here as an attachment.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • The problem is that you're trying to return too much data at the very beginning. The only thing you should initially be concerned with for paging is getting the correct ID's first. Try this and let us know how it works for you...

    WITH

    cteNumber AS

    (

    SELECT ID, ROW_NUMBER() OVER (ORDER BY ID ASC) AS IXX

    FROM dbo.ContactForms

    )

    SELECT TOP 20 data.*

    FROM dbo.ContactForms data

    INNER JOIN cteNumber num

    ON data.ID = num.ID

    WHERE num.IXX > 30000

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Those TOP expressions really require an ORDER BY to make sense.

    The width of your table is a problem, consider moving the LOB data off row.

    As Jeff says, the thing to focus on is retrieving the keys quickly. This typically involves seeking on a non-clustered index for the keys ONLY, and then fetching the row data associated with the 20 rows you want at the end.

    If pages are accessed sequentially, consider maintaining an 'anchor' record - get the application to pass in the key for the last row it received. You can build a greater-than predicate based on the anchor which will speed the key seeking even further.

    When I do paging, I typically optimize the key search from both ends. If the page requested is in the second half of the table, I seek keys from the end, in reverse order.

    Apply some thought to what you are really trying to achieve, and you will make progress.

    Paul

  • Barry:

    I'm posting a different plan, the one for the query suggested by Jeff.

    What I meant originally is that it takes a lot to retrieve at the position 30.000, but due to some cache, fetching anywhere near 31.000 is faster, but after a while this vanishes.

    Jeff:

    I understand what you say, by creating a subquery with the index and the row number is crazy fast, then I filter the subquery and with the join I get the subset full of contents.

    Sounds efficient. Nevertheless, it seems something else is going on, incredibly the sample query you posted took the same to complete when looking at a high position (near 60.000)

    Paul:

    It's making index scans, not seeking, I'm posting the exec plan for Jeff's query so you can see it.

    Is there anything that can be done to improve this scenario ?

    Is there a chance that the table is borked somehow, I mean, it has over 50.000 pages. I tried recreating such a table locally with lob data, I created a 12gigs table and end up with only 4500 pages. Perhaps it hasn't anything to do, or perhaps it rings a bell.

    Ultimately I could write a script to "clone" the table and data so I can drop the current one, would appreciate finding a solution rather than implementing this brute approach.

    Thanks all for your contributions, but please don't stop now 🙂

  • agustingarzon (2/26/2010)


    ID is a primary key (clustered index). The table indexes were just rebuilt. The table is about 1.2 gigs with 65.000 records ~, the max size of a row is around 120kbs (the sum of 3 nvarchar-max fields)

    Agustin,

    The quoted section above shows one of the major problems here, as I tried to say in my previous post. Remember that a clustered index defines the logical order of the data. What many people miss is that means that the clustered index is the widest possible index - it contains the row data after all.

    Try creating a unique nonclustered index on the ID column. Yes, I know that sounds like you are duplicating the index, but consider what will actually happen. Assuming that ID is an INTEGER column, it will use four bytes per row. 65,000 records will therefore consume 260,000 bytes - less than 254KB! With such a dense index, the scan of the ID column will be pretty near instant.

    Compare that to scanning the clustered index. Now, the clustered index isn't actually 120KB wide - the LOB columns are stored off-row in separate structures - but my guess is that the in-row data is a lot wider than four bytes 😀

    There are more things to be done to optimize paging, but this new index should be your first step:

    CREATE UNIQUE NONCLUSTERED INDEX <name>

    ON <table> (ID ASC)

    WITH (SORT_IN_TEMPDB = ON, MAXDOP = 1, FILLFACTOR = 100, ONLINE = OFF);

    You could also help us a great deal by posting the scripted CREATE TABLE and CREATE INDEX statements for the table in question, together with a small sample of data.

    Paul

  • agustingarzon (2/27/2010)


    Barry:

    I'm posting a different plan, the one for the query suggested by Jeff.

    What I meant originally is that it takes a lot to retrieve at the position 30.000, but due to some cache, fetching anywhere near 31.000 is faster, but after a while this vanishes.

    OK two things, first these are Estimated Query Plans, and not Actual Query Plans, we need the Actuals.

    Secondly, I am a little confused, is there a good form of this query or only a bad form?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (2/27/2010)


    Secondly, I am a little confused, is there a good form of this query or only a bad form?

    That was based on Jeff's code I think. The low estimated row counts relate to the effect of the TOP operator.

  • Paul White (2/27/2010)


    RBarryYoung (2/27/2010)


    Secondly, I am a little confused, is there a good form of this query or only a bad form?

    That was based on Jeff's code I think. The low estimated row counts relate to the effect of the TOP operator.

    Its still an Estimated plan. I plainly indicated that an Actual should be provided. That's even more important now that it appears that there is no "good" plan to compare it against.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (2/27/2010)


    Its still an Estimated plan. I plainly indicated that an Actual should be provided. That's even more important now that it appears that there is no "good" plan to compare it against.

    Right, I see 😎

    Yes, that would be nice.

  • exactly...:smooooth:

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Barry, I'm posting the actual exec plan (named Old_Exec) which is the plan before I implemented latest Paul's suggestion (a non clust ix).

    I implemented Paul's suggestion. I added the non clustered IX on the identity column, along with the primary key index.

    The query takes the same to execute though, slooow. (I keep using Jeff's query)

    WITH

    cteNumber AS

    (

    SELECT ID, ROW_NUMBER() OVER (ORDER BY ID ASC) AS IXX

    FROM dbo.ContactForms

    )

    SELECT TOP 20 data.*

    FROM dbo.ContactForms data

    INNER JOIN cteNumber num

    ON data.ID = num.ID

    WHERE num.IXX > 30000

    ;

    The execution plan changed. I can see it's now performing an index scan (on the non clustered index) which as Paul said should be lots faster than scanning through a clustered index with lots of data.

    But then it performs a clustered index scan which puts us back in zero square, why ? Because the clustered index scan is scanning through as much as 60.000 rows. (See New_Exec)

    In my local environment I removed the primary key making it a heap table (for testing purposes). By seeing the exec plan it first performs the non-clust index scan and finally an index seek whitin a small subset of 20 rows (instead of scanning thru as much as 60.000 rows). (See What_About_Exec)

    Would you advise against it provided all the circumstances I described ? So far seems the fastest (haven't tested in prod server)

    ---

    The LOB data is not binary data, just long strings, I'm attaching the table create script. There are 3 large fields (nvarchar-max) that can add up to 100kb in some rows. So it wouldn't be stored outside the row, perhaps that's why it was performing so poorly.

    ---

    Thanks for your replies 🙂

  • Barry, I'm posting the actual exec plan (named Old_Exec) which is the plan before I implemented latest Paul's suggestion (a non clust ix).

    I implemented Paul's suggestion. I added the non clustered IX on the identity column, along with the primary key index.

    The query takes the same to execute though, slooow. (I keep using Jeff's query)

    WITH

    cteNumber AS

    (

    SELECT ID, ROW_NUMBER() OVER (ORDER BY ID ASC) AS IXX

    FROM dbo.ContactForms

    )

    SELECT TOP 20 data.*

    FROM dbo.ContactForms data

    INNER JOIN cteNumber num

    ON data.ID = num.ID

    WHERE num.IXX > 30000

    ;

    The execution plan changed. I can see it's now performing an index scan (on the non clustered index) which as Paul said should be lots faster than scanning through a clustered index with lots of data.

    But then it performs a clustered index scan which puts us back in zero square, why ? Because the clustered index scan is scanning through as much as 60.000 rows. (See New_Exec)

    In my local environment I removed the primary key making it a heap table (for testing purposes). By seeing the exec plan it first performs the non-clust index scan and finally an index seek whitin a small subset of 20 rows (instead of scanning thru as much as 60.000 rows). (See What_About_Exec)

    Would you advise against it provided all the circumstances I described ? So far seems the fastest (haven't tested in prod server)

    ---

    The LOB data is not binary data, just long strings, I'm attaching the table create script. There are 3 large fields (nvarchar-max) that can add up to 100kb in some rows. So it wouldn't be stored outside the row, perhaps that's why it was performing so poorly.

    ---

    Thanks for your replies 🙂

  • agustingarzon (3/1/2010)


    The LOB data is not binary data, just long strings, I'm attaching the table create script. There are 3 large fields (nvarchar-max) that can add up to 100kb in some rows. So it wouldn't be stored outside the row, perhaps that's why it was performing so poorly.

    All LOB data over a certain size (and all MAX data types by default) is stored off-row, regardless of whether it is binary or character-based.

    None of the various posts allow me to read the attachments - I get a server error message. Anyone else having more luck?

    Paul

  • I double posted by mistake, sorry.

    Here are all the plans in the attached zip.

Viewing 15 posts - 1 through 15 (of 46 total)

You must be logged in to reply to this topic. Login to reply