Forcing the Order

  • Obie kabie.

    Granted that indexing is incomplete here, but the optimizer isn't whining about it missing and the plans are all full plans, no timeouts or trivials. I'll address the index after this first scenario.

    I swapped around one or two things and forced an order on a query and got 10x gains on it. What I'm trying to nail down is why that's blowing the optimizer away, it doesn't make sense off hand.

    First we'll need some tables. Some of you may have noticed these tables are becoming a recurring theme for me of late:

    CREATE TABLE Tally (N INT NOT NULL) ON PRIMARY

    -- Slightly modified from tally article

    INSERT INTO Tally (N)

    SELECT TOP 1000000 -- Yes, that's 1 million.

    ROW_number() OVER (ORDER BY RAND()) AS n --Well Aware RAND() is called once, that's on purpose.

    FROMMaster.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2,

    Master.dbo.SysColumns sc3

    INSERT INTO Tally (n) VALUES (0)

    ALTER TABLE Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    ON NCDataStore

    CREATE TABLE Items25000

    (ItemIDINT IDENTITY(1,1) NOT NULL,

    ItemDescVARCHAR(100),

    SubcategoryIDINT CONSTRAINT FK_Items25000_SubcategoryID FOREIGN KEY REFERENCES Subcategories( SubcategoryID),

    CONSTRAINT PK_Items25000 PRIMARY KEY NONCLUSTERED ( ItemDesc) ON NCDataStore

    )

    ON [PRIMARY]

    CREATE UNIQUE CLUSTERED INDEX idx_Items25000 ON Items25000 ( ItemID) ON [PRIMARY]

    INSERT INTO Items25000 (ItemDesc, SubcategoryID)

    SELECTNEWID() AS ItemDesc, N%520 + 1

    FROMTally

    WHEREN between 1 and 25000

    UPDATE Items25000 SET ItemDesc = 'TestItem' WHERE ItemID = 15000

    CREATE TABLE PriceHistoryTable_25000_NoBuf

    (AuditIDINT IDENTITY(1,1) NOT NULL,

    ItemIDINT NOT NULL CONSTRAINT FK_PriceHistoryTable_25000_NoBuf_ItemID FOREIGN KEY REFERENCES Items25000(ItemID),

    ItemPriceDECIMAL( 19, 4) NOT NULL,

    UpdateDateDATETIME NOT NULL

    )

    ON [PRIMARY]

    DECLARE @YearsInDate INT,

    @RecordDensity INT

    SET @YearsInDate = 20

    SET @RecordDensity = 25

    INSERT INTO PriceHistoryTable_25000_NoBuf

    (ItemID, ItemPrice, UpdateDate)

    SELECT

    ( t.N%@RecordDensity) + 1 AS ItemID, -- +1 to avoid ID: 0

    ( t.n + 10) * 2. / 3. AS ItemPrice, -- Some Price, we don't really care.

    DATEADD( yyyy, 90, DATEADD( ms, ABS( CHECKSUM(NEWID()))%1000, DATEADD( s, ABS( CHECKSUM(NEWID()))%86400, ABS( CHECKSUM(NEWID())% (365 * @YearsInDate)) )) ) AS UpdateDate

    FROM

    Tally AS t

    WHERE

    t.N between 1 and ( 25 * @RecordDensity)

    CREATE CLUSTERED INDEX idx_PriceHistoryTable_25000_NoBuf ON PriceHistoryTable_25000_NoBuf ( UpdateDate, ItemID) ON [PRIMARY]

    So, now for our little testing scenario. The query plans make sense and I realize the data volume is throwing things off here, however I don't understand why the query optimizer is making the decisions it is in regards to the second query.

    IF OBJECT_ID('tempdb..#t0') IS NOT NULL

    DROP TABLE #t0

    IF OBJECT_ID('tempdb..#t1') IS NOT NULL

    DROP TABLE #t1

    PRINT '----------------------------------------------------------------'

    PRINT '25000 MAX FORCE ORDER'

    SET STATISTICS IO, TIME ON

    SELECT

    it.*, '|' AS dv, pht.ItemPrice, pht.UpdateDate

    INTO #t0

    FROM

    (SELECT ItemID, MAX( UpdateDate) AS MaxDate

    FROM PriceHistoryTable_25000_NoBuf

    GROUP BY ItemID

    ) AS drv

    JOIN

    Items25000 AS it

    ONit.ItemID = drv.ItemID

    JOIN

    PriceHistoryTable_25000_NoBuf AS pht

    ONpht.ItemID = drv.ItemID

    AND pht.UpdateDate = drv.MaxDate

    OPTION (MAXDOP 1,

    FORCE ORDER)

    SET STATISTICS IO, TIME OFF

    PRINT '----------------------------------------------------------------'

    PRINT '25000 MAX'

    SET STATISTICS IO, TIME ON

    SELECT

    it.*, '|' AS dv, pht.ItemPrice, pht.UpdateDate

    INTO #t1

    FROM

    Items25000 AS it

    JOIN

    PriceHistoryTable_25000_NoBuf AS pht

    JOIN

    (SELECT ItemID, MAX( UpdateDate) AS MaxDate

    FROM PriceHistoryTable_25000_NoBuf

    GROUP BY ItemID

    ) AS drv

    ONpht.ItemID = drv.ItemID

    AND pht.UpdateDate = drv.MaxDate

    ONit.ItemID = pht.ItemID

    OPTION (MAXDOP 1)

    SET STATISTICS IO, TIME OFF

    Now, this wouldn't be as big a deal to me if the timings weren't incredibly different.

    Now, a little modification:

    CREATE NONCLUSTERED INDEX idx_ReverseDate_PriceHistoryTable_25000_NoBuf

    ON PriceHistoryTable_25000_NoBuf (ItemID, UpdateDate) ON NCDataStore

    This speeds up both queries with the forced order still staying in front, by a significant margin until the datasets are insanely small (I have a series of tables like this for 25, 250, 2500, and 25000 items), but it never falls out of the lead... but has the exact same query plan... and I'm running the slower one *second*. Now, the margin of difference isn't 10 ms, QryForced averages about 350ms, QryNotForced 420 or so on my system. At worst it's about 10% faster, on average about 15%. That's significant for the same query plan, with the additional index. Without it ForceOrder just blows it away.

    What am I missing in this? I know why the index makes it faster in general, what I'm trying to puzzle out is the decisions the optimizer is making in the first case, and why the same plan is different speeds consistently in the second.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I haven't looked at the plans yet but something is different about them. The two show a different size. I am in the middle of reinsalling SQL Sentry Plan Explorer to view the plans.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • SQLRNNR (5/23/2011)


    I haven't looked at the plans yet but something is different about them. The two show a different size. I am in the middle of reinsalling SQL Sentry Plan Explorer to view the plans.

    Heh, should have attached the other two, after the indexing. The two you see attached are from the non-indexed version, just the forced order.

    Attached find the two after the index is in play. Sorry for the confusion.

    This is common for the difference in times between the two after indexing:

    ----------------------------------------------------------------

    25000 MAX FORCE ORDER

    Table 'PriceHistoryTable_25000_NoBuf'. Scan count 2, logical reads 3803, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Items25000'. Scan count 1, logical reads 179, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 328 ms, elapsed time = 336 ms.

    ----------------------------------------------------------------

    25000 MAX

    Table 'PriceHistoryTable_25000_NoBuf'. Scan count 2, logical reads 3803, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Items25000'. Scan count 1, logical reads 179, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 343 ms, elapsed time = 430 ms.

    ----------------------------------------------------------------


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I don't know the exact reason, but my guess it has to do with the clustered index being in the reverse order of the columns as listed in your SELECT. It can still find it, but it has to work harder. When you add the nonclustered index in the order you're actually looking for the data, it's able to take advantage of that, which helps. Then I think your FORCE ORDER is somehow allowing the same operations (per the query plan/optimizer) to align better with the clustered index.

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Interesting results. Prior to the NC it seems that scanning the PriceHistoryTable twice seems (force order) to be significantly faster than scanning once when using Statistics Time, but if I run in a single batch the Cost estimates for each query shows the option without force order is 42% of the batch vs. 58% for the force order plan.

    Once I add the NC I get a basic deadheat from Statistics Time.

    I'm running 2008 (10.50.1600.1)

  • I am finding this to be somewhat intriguing. I am running these queries and not seeing the same results. Consistently the force order is taking longer to run on my system. it also shows that the query plan is more expensive than the non "force order" query.

    SQL Server 10.50.1600 64 Bit Win 7 8GB RAM.

    The Statistics IO (without the extra NC) shows that the Forced Order also reverses the order of access on the tables and is performing a lot more work against PriceHistory than Items25000. Though the smallish numbers could be negligible.

    For fun, I added a third query to the mix to see how it would perform. It uses the same exact plan, outputs the same statistics IO as the non forced order query and executes consistently in half the time. Down-side is that the plan cost is marginally better than the forced order and more costly than the non forced order plan.

    PRINT '----------------------------------------------------------------'

    PRINT '25000 MAX'

    SET STATISTICS IO, TIME ON

    SELECT

    it.*, '|' AS dv, pht.ItemPrice, pht.UpdateDate

    INTO #t3

    FROM

    Items25000 AS it

    JOIN

    PriceHistoryTable_25000_NoBuf AS pht

    JOIN

    (SELECT ItemID, MAX( UpdateDate) AS MaxDate

    FROM PriceHistoryTable_25000_NoBuf

    GROUP BY ItemID

    ) AS drv

    ON pht.ItemID = drv.ItemID

    AND pht.UpdateDate = drv.MaxDate

    ON it.ItemID = pht.ItemID

    ORDER BY it.itemid

    OPTION (MAXDOP 1)

    SET STATISTICS IO, TIME OFF

    Simple enough change.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Hi Craig,

    The following index appears optimal to me for this query:

    CREATE UNIQUE NONCLUSTERED INDEX [UQ dbo.PriceHistoryTable_25000_NoBuf -ItemID, -UpdateDate (ItemPrice)]

    ON PriceHistoryTable_25000_NoBuf (ItemID DESC, UpdateDate DESC)

    INCLUDE (ItemPrice);

    It allows the Segment Top transformation without a sort, which is the direction the optimizer was kinda heading in before you added the extra index. For more on Segment Top, see http://sqlblog.com/blogs/paul_white/archive/2010/07/28/the-segment-top-query-optimisation.aspx

    I'll respond to your more general questions in a separate post.

    Paul

  • Hi Craig,

    Looking at the first two query plans you posted (without your index), the optimizer does pick the better plan - where better = lowest estimated cost. The estimated cost for the 'normal' plan is around 10, whereas the estimated cost for the FORCE ORDER plan is around 14 cost units.

    So, why does the FORCE ORDER plan perform better? Look at the Sort in the normal plan - it estimates 25097.2 rows, but 625,000 rows actually flow through it. This is important because query execution sets the memory grant for the query on the basis of the estimated number of rows.

    Once the memory grant is set (before execution starts) in cannot be increased, regardless of how much free memory your server has at that point in time. If the actual size of the set of rows to be sorted exceeds the memory grant, the whole row set is physically written to tempdb (not memory!), sorted, and read back into the query. When this happens, you will see a Sort Warnings event in Profiler. The 'normal' query plan is spilling its sort to tempdb, and that's why it is slower.

    In the FORCE ORDER plan, the estimated number of rows flowing into the sort is 24987.6, and the actual number of rows encountered is 25,000. Although the estimate is very slightly too low, chances are that the memory grant assigned based on the estimate turns out to be enough to fully sort the set in memory. You would not see a Sort Warnings Profiler event for this query.

    So, in summary, the cause is cardinality estimation leading to a sort that spills to tempdb disk. The optimizer made the correct choice of plan, but it was based on imperfect information.

    Paul

  • SQLRNNR (5/23/2011)


    For fun, I added a third query to the mix to see how it would perform. It uses the same exact plan, outputs the same statistics IO as the non forced order query and executes consistently in half the time...simple enough change.

    Hi Jason,

    Beware query rewrites like this. ORDER BY with a SELECT INTO does not guarantee anything (even as far as identity column assignment is concerned). In this particular case, on your particular build of the engine, you found that adding the ORDER BY produced a plan which is more efficient, in practice, on your machine. On my machine, your query produces a plan with two sorts, and performs slower than the alternatives.

    Generally, I advise against this sort of 'creative' textual rewrite because it depends on implementation side-effects. This is a bad idea because it is results in an unstable plan - it is much better to create the correct index for the query, or take steps to provide better information to the optimizer. When that fails, we can use hints or plan guides. Adding an ORDER BY to SELECT INTO is not a sound alternative to those good practices.

    Paul

  • Craig Farrell (5/23/2011)


    Attached find the two after the index is in play. Sorry for the confusion. This is common for the difference in times between the two after indexing...

    CPU time = 328 ms, elapsed time = 336 ms.

    CPU time = 343 ms, elapsed time = 430 ms.

    Hi Craig,

    I have been unable to reproduce the effect you see, partly because I don't naturally get the merge join plans you see with the sample data and indexes given. Perhaps you have a different default fill factor from the rest of us, perhaps it is a consequence of the different file groups in your script, ...or something else.

    One thing you might try is to place the two queries in separate batches (separated by 'GO'). There could be all sorts of subtle effects going on here, but without access to your machine or a fully reproducible test, I really can't say for sure.

  • The last point I want to make is that FORCE ORDER is an immensely powerful hint with many counter-intuitive side-effects, so it should be used exceedingly sparingly, and only by those with a full appreciation of the issues. The FORCE ORDER hint disables several key optimizer rules which can easily prevent intermediate query designs being considered that could be further transformed to a much more optimal plan.

  • ...Brain exploding.....can't ..think...anymore...!!!!!

    Gah!

    *Achoo!!!

    There, all better.

    Thanks Paul!

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • jcrawf02 (5/24/2011)


    ...Brain exploding.....can't ..think...anymore...!!!!!

    Gah!

    *Achoo!!!

    There, all better.

    Thanks Paul!

    I think when Paul starts teaching, that most of us get that way. Maybe we need to start bugging him for more articles??? 😉

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Sweet! Thanks for getting involved Paul. Alright, let's take it from the top, boys! *drumroll starts*

    SQLkiwi (5/23/2011)


    Hi Craig,

    It allows the Segment Top transformation without a sort, which is the direction the optimizer was kinda heading in before you added the extra index. For more on Segment Top, see http://sqlblog.com/blogs/paul_white/archive/2010/07/28/the-segment-top-query-optimisation.aspx

    That makes sense and I saw that, kinda. I've been goofing off with different indexes against this structure banging my head on the optimizer trying to puzzle something out that feels like deep water fishing with my rowboat skills for what I'm trying to do. I have another index that behaves where updateDate DESC gets involved, but that's mostly driving a second component using Cross Apply. I've read your article before with the segment top but I'm going to have to re-read it with this in mind, now that I've got a perty little example sitting in front of me.

    So, why does the FORCE ORDER plan perform better? Look at the Sort in the normal plan - it estimates 25097.2 rows, but 625,000 rows actually flow through it. This is important because query execution sets the memory grant for the query on the basis of the estimated number of rows.

    *facepalm* I knew I was missing something in there. Dang PEBKAC errors. I must have breezed over that sort 5 times, along with the hash in front of it. I'll have to find out if my statistics went stupid somewhere and try getting that cleaned up.

    ... on second review that hash appears to feed the sort the proper estimate, the hash is just inaccurate. It's picking up the estimate via items and ignoring the volume inbound from pricehistory. A statistics update with fullscan (my usual goto when I'm having silliness between estimation and reality) doesn't affect this significantly. Anyone have some ideas on an approach to get that value more accurate? A secondary question, really. I realize better indexing will change the plan utterly avoiding the issue.

    Looking at the first two query plans you posted (without your index), the optimizer does pick the better plan - where better = lowest estimated cost. The estimated cost for the 'normal' plan is around 10, whereas the estimated cost for the FORCE ORDER plan is around 14 cost units

    Hm, I'd forgotten that little tidbit. Now that you mention it, it's odd the cost is ~70% of the other.

    Once the memory grant is set (before execution starts) in cannot be increased, regardless of how much free memory your server has at that point in time. If the actual size of the set of rows to be sorted exceeds the memory grant, the whole row set is physically written to tempdb (not memory!), sorted, and read back into the query. When this happens, you will see a Sort Warnings event in Profiler. The 'normal' query plan is spilling its sort to tempdb, and that's why it is slower.

    In the FORCE ORDER plan, the estimated number of rows flowing into the sort is 24987.6, and the actual number of rows encountered is 25,000. Although the estimate is very slightly too low, chances are that the memory grant assigned based on the estimate turns out to be enough to fully sort the set in memory. You would not see a Sort Warnings Profiler event for this query.

    Thank you, that makes perfect sense.

    I have been unable to reproduce the effect you see, partly because I don't naturally get the merge join plans you see with the sample data and indexes given. Perhaps you have a different default fill factor from the rest of us, perhaps it is a consequence of the different file groups in your script, ...or something else.

    One thing you might try is to place the two queries in separate batches (separated by 'GO'). There could be all sorts of subtle effects going on here, but without access to your machine or a fully reproducible test, I really can't say for sure.

    Nope, just a good old fashioned out of the box install/build, but a good idea offhand:

    -- !!!!!!!!!!!!!

    -- All code to be left 2k5 compliant!

    -- !!!!!!!!!!!!!

    USE MASTER

    GO

    IF DB_ID(N'DataVolumeOptimizations') IS NOT NULL

    DROP DATABASE DataVolumeOptimizations;

    GO

    -- The current setup here sets up everything on a single drive.

    -- As a best practice, you're best off physically separating the ldf and mdf files, and

    -- putting the ndf files on yet another physical. I'm working with what I have.

    CREATE DATABASE DataVolumeOptimizations

    ON PRIMARY (NAME = PrimaryDataStore,

    FILENAME = 'C:\ssc\DataVolumeOptimizations_Primary.mdf',

    SIZE = 1024 MB,

    MaxSIZE = 10 GB,

    Filegrowth = 1024 MB

    ),

    FILEGROUP NCDataStore

    (NAME = NCDataStore,

    FILENAME = 'C:\ssc\DataVolumeOptimizations_NC.ndf',

    SIZE = 1024 MB,

    MAXSIZE = 10 GB,

    Filegrowth = 1024 MB

    ),

    FILEGROUP LOBDataStore

    (NAME = LOBDataStore,

    FILENAME = 'C:\ssc\DataVolumeOptimizations_LOB.ndf',

    SIZE = 1024 MB,

    MAXSIZE = 10 GB,

    FILEGROWTH = 1024 MB

    )

    LOG ON (name = DataVolumeOptimizations_Log,

    FILENAME = 'C:\ssc\DataVolumeOptimizations_log.ldf',

    -- Reference http://sqlskills.com/BLOGS/KIMBERLY/post/Transaction-Log-VLFs-too-many-or-too-few.aspx

    -- if this doesn't make sense. just under a gig, roughly

    -- This is a simple Recovery DB and I don't really want to fight the log.

    SIZE = 950 MB,

    MAXSIZE = 10 GB,

    FILEGROWTH = 950 MB

    )

    GO

    ALTER DATABASE [DataVolumeOptimizations] SET RECOVERY SIMPLE WITH NO_WAIT

    GO

    I was afraid something like that was going to happen for folks. These plans were built on a desktop engine for SQL 2k8 (not R2). Don't know if that helps any. I have the full DB build script including all the filegroups if you like, with additional tables of a similar bent that continued the pattern.

    You're right, it's all about the GO's though. Mighty odd. After including the GO it disappeared and removing it the difference isn't as noticable, though I've rebooted and the like since then, so that may have had an affect as well.

    The last point I want to make is that FORCE ORDER is an immensely powerful hint with many counter-intuitive side-effects, so it should be used exceedingly sparingly, and only by those with a full appreciation of the issues. The FORCE ORDER hint disables several key optimizer rules which can easily prevent intermediate query designs being considered that could be further transformed to a much more optimal plan.

    I agree and is not my normal method of action, but understanding the moving parts here is the entire point of my exercise... but I'm not sure if that was aimed at me or just to make sure some poor random soul who's buzzing the thread gets his due warning. 🙂

    Thanks for your time Paul, appreciated as always.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Apologies to those who were trying to help me with this. There's an error in the build script that's not actually showing up in my issue. Price history table built to 625 records off that script. It's supposed to be 625,000.

    The modification is below:

    INSERT INTO PriceHistoryTable_25000_NoBuf

    (ItemID, ItemPrice, UpdateDate)

    SELECT

    ( t.N%@RecordDensity) + 1 AS ItemID, -- +1 to avoid ID: 0

    ( t.n + 10) * 2. / 3. AS ItemPrice, -- Some Price, we don't really care.

    DATEADD( yyyy, 90, DATEADD( ms, ABS( CHECKSUM(NEWID()))%1000, DATEADD( s, ABS( CHECKSUM(NEWID()))%86400, ABS( CHECKSUM(NEWID())% (365 * @YearsInDate)) )) ) AS UpdateDate

    FROM

    Tally AS t

    WHERE

    t.N between 1 and ( 25000 * @RecordDensity)


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply