Unique Index Causing Slow Read Performance

  • ben.mcintyre (9/8/2009)


    I badly need my views here. Due to the size of the data chunks I'm dealing with I have to wrap them in views to stay sane. eg. Produce details are needed frequently, so there's a view wrapping the produce table and the associated brands, grades, classes, product type, varieties, pack types, pack weights, colours, etc (about 10 tables). If I had to deal with these individually in every query, my head would explode, and I would have to upgrade to that 26" monitor.

    Unfortunately layers of views often tend to result in performance problems, partially from the complexity of the resulting query, partially from the chance that a lower level view will have to be materialised in its entirely, and partially from high compile times as SQL tries to trim down the unrolled query

    (Hey, please, I know this site is prettly much dedicated to the hunting down and elimination of dynamic SQL. I know what rbar is and who Jeff Moden is. I respect and agree with all of that. The dynamic SQL is actually quite necessary 🙂 )

    No it's not. There's nothing wrong with dynamic SQL used correctly. It's cursors and while loops that we're allergic to here.

    You're using global cursors?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/8/2009)


    ben.mcintyre (9/8/2009)


    I badly need my views here. Due to the size of the data chunks I'm dealing with I have to wrap them in views to stay sane. eg. Produce details are needed frequently, so there's a view wrapping the produce table and the associated brands, grades, classes, product type, varieties, pack types, pack weights, colours, etc (about 10 tables). If I had to deal with these individually in every query, my head would explode, and I would have to upgrade to that 26" monitor.

    Unfortunately layers of views often tend to result in performance problems, partially from the complexity of the resulting query, partially from the chance that a lower level view will have to be materialised in its entirely, and partially from high compile times as SQL tries to trim down the unrolled query

    Yah. Bummer.

    GilaMonster (9/8/2009)


    (Hey, please, I know this site is prettly much dedicated to the hunting down and elimination of dynamic SQL. I know what rbar is and who Jeff Moden is. I respect and agree with all of that. The dynamic SQL is actually quite necessary 🙂 )

    No it's not. There's nothing wrong with dynamic SQL used correctly. It's cursors and while loops that we're allergic to here.

    You're using global cursors?

    (Giggle) Sorry, I meant cursors and while loops. Unfortunately, these tend to occur with Dynamic SQL for me, hence the association.

    Global cursors. Disgusting, I know. In this case, I need to declare the cursor using dynamic SQL. Hence the global cursor. A local one evaporates with the EXEC statement.

    I believe there is no other way (please correct me if there is !).

    Don't worry, there aren't many cursors and they are local cursors where there's no dynamic SQL.

    I would argue that the cursors are actually necessary where used as well, but that's another rather long piece of string.

    I do love this site, but I have to 'come out' to being an application developer as well as SQL afficionado. It's hard sometimes. Sometimes I feel like an Israeli citizen from a Palestinian family. It's fine as long as I don't mention procedural code or OR/M (in the SQL forums), or query tuning and performance (in the coding forums).

  • True, but I have changed only one thing: a unique index to a nonunique index. Now call me old fashioned, but I believe that there should be either no difference in how these are implemented for a read-only scenario, or the unique index should be more efficient, hence this shouldn't cause a significant change in the plan.

    This is why I have the audacity to call it a bug.

    The way I see it, the plan is big, but that's not necessarily the same as complex. Big things that can be broken down in to small simple chunks are not necessarily that complex, and the indexing and constraints are the fuel to drive the problem solving on a micro level. This should scale.

    I'm kinda wanting a little more consistency and control than I'm getting here.

    1) Nothing to do with "old fashioned" - just not understanding the optimization process in sufficient detail. Not sure why you don't think it could be a VERY important piece of information to the optimizer that a column (or set of them) can only exist zero or one times in a table though - instead of possibly umpteen kajillion times. 🙂

    2) As the plan gets big, the number of permutations (and the CPU effort and duration it takes to calculate them all) grows . . . hmm, is it geometrically or exponentially? Doesn't really matter which - both are bad. More importantly though is that every piece of statistical information used to do the math in the cost-based optimizer becomes less precise and meaningful and any skews are magnified with each layer of complexity/size/join.

    I would love it if I could have more control

    To coin a phrase: "You can't HANDLE more control!!!" 😀 Seriously, you are complaining vociferously about how complex your queries and views are, and the amount of effort and maintenance hastle it is to tune/refactor them. Do you honestly think having to LEARN and truly UNDERSTAND the rocket-science that is the optimization engine and how to tweak it to the Nth degree will be LESS effort?? No way Hose! There are actually many things one can do to 'override' the optimizer, and there are certainly times to do so. But thankfully 98.3% of the developers (and code) out there don't NEED to do so. And when one does need it, that is what a hired-gun is for. Get someone who DOES know the engine back and forth and who's line of work is improving SQL Server query performance.

    One other thing - when one DOES override the optimizer to make a particular query fly for a given input or set of inputs, there is quite often some other set of inputs that are HORRIBLY inefficient as a result of the 'tuning'. The best I have ever done on a single sproc for tuning (just a hair under 6 ORDERS OF MAGNITUDE performance improvement) was primarily the result of removing forced index hints.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • GilaMonster (9/8/2009)


    (Hey, please, I know this site is prettly much dedicated to the hunting down and elimination of dynamic SQL. I know what rbar is and who Jeff Moden is. I respect and agree with all of that. The dynamic SQL is actually quite necessary 🙂 )

    No it's not. There's nothing wrong with dynamic SQL used correctly. It's cursors and while loops that we're allergic to here.

    You took the words right out of my mouth. Thanks, Gail.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (9/8/2009)


    2) As the plan gets big, the number of permutations (and the CPU effort and duration it takes to calculate them all) grows . . . hmm, is it geometrically or exponentially?

    I suspect it's close to n! (though I haven't bothered to check the theory to confirm). It's pretty bloody fast-growing whichever way. There's a reason why plan generation/plan modification is a major field of research both in databases and artificial intelligence. 😀

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ben.mcintyre (9/4/2009)


    I realise the temp tables are a viable workaround (and one I have used in places), but this is for reporting and it means rather than simply constructing a query for a report I have to deal with instantiating and dropping possibly several temp tables with dynamic names (since I have multiple sessions and each name needs to have the sessionid added to the end).

    It raises the complexity of things greatly and given I have about 200 reports in the database, you can understand what this means in terms of time.

    I've not read all the posts on this thread but I want you to be made aware that none of that is necessary and you may want to spend a little time reading Books Online about Temp Tables, how they're used, their scope, and how they're named.

    1. You don't need to drop Temp Tables... they end when the session ends which is usually quite quickly.

    2. Temp Tables are uniquely named behind the scenes so that two or more sessions using a temp table with the same name have no conflicts.

    3. Properly used, Temp Tables can actually greatly simplify reporting requirements and increase performance because of their "Divide'n'Conquer" nature. To wit, they can help code beat the tar out of the usage of views especially nested views.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ben.mcintyre (9/8/2009)


    I believe there is no other way (please correct me if there is !).

    You've just not provided enough information to say one way or another but, in 99.9% of the cursors I've come across, they haven't been necessary for anything except cross-database control loops. Even such non-RBAR control loops as that can be done without a control loop but it's many times not worth it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff

    Example of dynamic SQL forcing global cursor (this one is from a metafunction library, but it demonstrates the idea).

    exec sp_lib_KillCursor 'fetch_cursor'

    SELECT @s-2 = 'DECLARE fetch_cursor CURSOR FOR ' +

    'SELECT c.name, c.xtype, c.status, c.IsComputed,

    FROM syscolumns c, sysobjects b, systypes t

    WHERE t .xtype = c.xtype

    AND t .xusertype = c.xusertype

    AND c.id = b.id

    and b.name = ''' + @TableName + ''''

    EXEC (@s)

    OPEN fetch_cursor

    For session based SPROCS the global cursor name has the sessionID appended so as to not cause clashes.

    And confirming: you cannot use temp tables in views ?

    Hence if you have a query referencing 15 variously nested views and some of the bottom ones are causing the problems, you have to discard all (or most of) the views and go to a mega-query comprising the temp tables and the remaining raw tables to get your result ?

    This is my problem: if forced to choose between views and performance (here, specifically temp tables), I'll take the views (not my choice really, it's a matter of cost-benefit). The OP is asking if I can possibly have both.

    As it happens, in this case it appears I can have both by upgrading to SQL2005.

    Further ramifications (if any) of that upgrade will no doubt become apparent.

  • Ben, six years later I seem to have the same issue you had with a UNIQUE constraint. My data consists of 2 tables, a view and a temp table, quite a simple data set. I've posted this to SQL Server Central. You can get to the topic via:

    http://www.sqlservercentral.com/Forums/FindPost1723690.aspx

    I'm running this on 2008 R2.

    I don't know if you'd still be interested. I found your thread once I had boiled my issue down to the existence or not of the UNIQUE constraint. I thought you might be interested.

  • Interesting. It seems a like the query plan alternatives are perhaps a bit fractal and sometimes you just end up with a really bad one.

    I had an interesting experience as a student (in the early 90s), writing a calibration algorithm using Newton's method that diverged and oscillated wildly when starting out in some regions - I had to use a fairly good initial guess to stop that occurring.

    My supervisor wouldn't believe me, and it wasn't until a few years later that I read James Gleick's 'Chaos' that I realised it was a classic example of chaotic instability. Made me realise it can crop up under the most innocuous conditions.

    Even six years later on, I still think that while the automatons do a great job most of the time, and even after all the 'it's context dependent' counter arguments, it would be great for Transact SQL to offer specific atomic predictable control over elements of query plan generation.

    Sure, used badly, it might cause great woe, but I'm pretty sure there are contexts that actually won't vary much over the database's life where it would work to great advantage.

  • A shotgun in the dark here, but one thing I found in similar circumstances (i.e. adding a Unique index trashes performance):

    Do you have any joins that have different data types but the same values (e.g., a Varchar field containing only numbers, joining to an Int field - while it works, it causes un-optimal plans to be generated)? This can throw the optimizer for a loop (pun intended).

    I encountered this just a couple of months ago and saw a similar issue - creating what should have been an excellent index actually made everything work worse until I tracked down a data type mismatch problem. Are all your join columns identically defined (I see the table has both "Int" and "Numeric(19,0)" types - if any of the fields being joined have even slightly different data type declarations, SQL may decide the joins require type conversions and decide the "cost" of those joins is higher than a different plan.)

  • Another shot in the dark is that it might now be doing many, many index seeks rather than a single scan. Depending on the size of the table, that can easily cause the logical reads to double, triple, or even quadruple not to mention the other effects that it might have on the execution plan. All of that takes a good bit of extra time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Kim: just for posterity, no, all the joins in the database have identical data types on each side.

    The two different ID types reflect only that some tables might conceivably be required to generate more than int32.MaxValue records (noting that removal and regeneration of records when generating for example costing figures is quite frequent).

  • ben.mcintyre (10/7/2015)


    Kim: just for posterity, no, all the joins in the database have identical data types on each side.

    The two different ID types reflect only that some tables might conceivably be required to generate more than int32.MaxValue records (noting that removal and regeneration of records when generating for example costing figures is quite frequent).

    Is there a reason for using "Numeric(19,0)" instead of BIGINT?

    Both store the same range of values (19-digit integer values), but BIGINT is an 8-byte true integer value, while Numeric(19,0) is a 9-byte scaled value, which has to carry the additional precision information.

    I am just wondering if SQL is doing some scaling checks internally as part of the join and that is creating the additional time.

    I am sure we have people on the forum that have deep internal SQL Server knowledge who can let me know how far off base (pun) I am. :hehe:

    Perhaps the Unique constraint causes the optimizer to decide this index is really good, but then when it comes time to do the joins there is additional overhead?

  • I think this was because of an MS Access component. MS Access didn't recognise BIGINT at the time (2000).

    The NUMERICS are everywhere, there have been no problems with them to date that I'm aware of.

    FYI, this old thread was revived because rchantler recently posted a similar but MUCH simpler case - it is a much better test case for this problem:

    http://www.sqlservercentral.com/Forums/FindPost1723690.aspx

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply