The Most Common Query Blunders...

  • The most common/easiest to fix mistakes:

    Someone beat me to it but NOLOCK table hints are the #1 mistake I have seen. I would go so far as to say it's become an epidemic in the industry. Real DBAs know better but a majority of "DBA"s think that throwing NOLOCK hints everywhere is a best practice.

    Not enough I/O: tempdb, mdfs, ndfs, ldfs, OS, SQL Install spread across too few disks.

    databases set to full recovery in places where transaction log backups are not happening. Common in many dev environments.

    Nullable columns that always contain data (or where a blank or -1 would do).

    No index on Foreign keys.

    RBAR and hidden RBAR (not always as simple to fix/replace as what I've already mentioned but it's falls in the category of "most common query blunders". )

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (12/28/2015)


    The most common/easiest to fix mistakes:

    Someone beat me to it but NOLOCK table hints are the #1 mistake I have seen. I would go so far as to say it's become an epidemic in the industry. Real DBAs know better but a majority of "DBA"s think that throwing NOLOCK hints everywhere is a best practice.

    Not enough I/O: tempdb, mdfs, ndfs, ldfs, OS, SQL Install spread across too few disks.

    databases set to full recovery in places where transaction log backups are not happening. Common in many dev environments.

    Nullable columns that always contain data (or where a blank or -1 would do).

    No index on Foreign keys.

    RBAR and hidden RBAR (not always as simple to fix/replace as what I've already mentioned but it's falls in the category of "most common query blunders". )

    Not to mention nullable columns with default value constraint, does my head in and so does a non-unique index on unique column / column combination, why on earth? Another problem that often comes with the missing foreign key problem are disabled foreign keys, how about just rolling out the red carpet for the congestion, blocking and deadlocking instead.

    😎

  • Eirikur Eiriksson (12/28/2015)


    Alan.B (12/28/2015)


    The most common/easiest to fix mistakes:

    Someone beat me to it but NOLOCK table hints are the #1 mistake I have seen. I would go so far as to say it's become an epidemic in the industry. Real DBAs know better but a majority of "DBA"s think that throwing NOLOCK hints everywhere is a best practice.

    Not enough I/O: tempdb, mdfs, ndfs, ldfs, OS, SQL Install spread across too few disks.

    databases set to full recovery in places where transaction log backups are not happening. Common in many dev environments.

    Nullable columns that always contain data (or where a blank or -1 would do).

    No index on Foreign keys.

    RBAR and hidden RBAR (not always as simple to fix/replace as what I've already mentioned but it's falls in the category of "most common query blunders". )

    Not to mention nullable columns with default value constraint, does my head in and so does a non-unique index on unique column / column combination, why on earth? Another problem that often comes with the missing foreign key problem are disabled foreign keys, how about just rolling out the red carpet for the congestion, blocking and deadlocking instead.

    😎

    Yep, yep. Or that non-clustered index without any include columns, especially when you would only need one or two to avoid a key lookup.

    I'm cleaning up some questionable code at the moment and another biggie is unnecessary ORDER BY clauses (regardless of what indexes are available). Or GROUP BY SomeFunction(blah blah)... or GROUP BY case statements... I'm looking at some of that right now.:hehe:

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (12/28/2015)


    Someone beat me to it but NOLOCK table hints are the #1 mistake I have seen.

    Heh... apparently, thousands of people agree with you. Instead, they use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED to be sure they don't miss anything. ;-):-P:-D At least it's easy to get rid of when they do that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/28/2015)


    Alan.B (12/28/2015)


    Someone beat me to it but NOLOCK table hints are the #1 mistake I have seen.

    Heh... apparently, thousands of people agree with you. Instead, they use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED to be sure they don't miss anything. ;-):-P:-D At least it's easy to get rid of when they do that.

    That would be my favorite. I have seen experienced developers fall to this one.

    Just to mention things others have probably have already:

    * functions in a where clause

    * functions in a join condition

    * conversions in a where clause

    * not using temp tables when the base table is a heap (especially a large one)

    * left joining on a table only to filter in the where clause based on a column in the right table (makes it a inner join)

    * delete from a large table (do in chunks or consider truncate)

    * triggers everywhere for every individual column

    * top n rows without an order by (results not guaranteed to be consistent)

    * overuse of dynamic sql

    * code to insert into a logging table blocks reached, to help capture where an error occurs in a SP (horrendous on performance and there are better ways)

    I am sure there is more out there. As Jeff mentioned I like the idea of comments. I think it is better friend than being obsessive compulsive with indents (which do help, but pale in comparison to what comments can tell you).

    ----------------------------------------------------

  • Jeff Moden (12/25/2015)


    I believe my favorite non-SARGable problems are due to the inappropriate use of "OR" and things like the following...

    WHERE ISNULL(SomeNumericColumn,0) > 0

    ...

    WHERE ISNULL(SomeCharacterColumn,' ') > ' '

    ... which, of course, is due to a simple lack of knowledge concerning NULLs and the fact that front-end code typically doesn't follow the same rules for NULLs as the soon-to-be-not-optional default rules that SQL Server follows.

    Ive done that second one above in yesterdays. ISNULL(someCharacterColumn,'')>''. Heh, the results were correct. I guess you mean that

    WHERE someCharacterColumn >''

    is more sargable.

    ----------------------------------------------------

  • MMartin1 (12/30/2015)


    Jeff Moden (12/25/2015)


    I believe my favorite non-SARGable problems are due to the inappropriate use of "OR" and things like the following...

    WHERE ISNULL(SomeNumericColumn,0) > 0

    ...

    WHERE ISNULL(SomeCharacterColumn,' ') > ' '

    ... which, of course, is due to a simple lack of knowledge concerning NULLs and the fact that front-end code typically doesn't follow the same rules for NULLs as the soon-to-be-not-optional default rules that SQL Server follows.

    Ive done that second one above in yesterdays. ISNULL(someCharacterColumn,'')>''. Heh, the results were correct. I guess you mean that

    WHERE someCharacterColumn >''

    is more sargable.

    Yes... that and the fact that NULLs aren't comparable to anything unless you've changed a setting that shouldn't have been changed (and has also been deprecated, BTW). Both find everything that is NOT NULL and NOT BLANK but, you're correct, the SomeCharacterColumn > '' is SARGable and the one with the ISNULL function is not (there is a "halo" exception... if the column has been defined as NOT NULL, the ISNULL is ignored). The other two WHERE clauses are not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/30/2015)


    (there is a "halo" exception... if the column has been defined as NOT NULL, the ISNULL is ignored). The other two WHERE clauses are not.

    Interesting... I didn't know that exception existed... Yet another difference between ISNULL and COALESCE.

  • Somewhat related to SQL Server, when writing remote queries or ETL packages against Oracle, try to remember that an empty string in PL/SQL is equivalent to NULL. At least understand that much before diving into the pool or you'll end up banging your head.

    For example in Oracle:

    ('' IS NULL) True

    ('' = '') False

    ('A' > '') False

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Heh... without going into too many details, I've found that one of the most common query blunders is the nearly religious belief in rote, so called "best practices" and some pretty ridiculous recommendations that are frequently recommended as if they were. Some examples...

    "Never use Cursors. Use Temp Tables and While Loops, instead."

    "Never use While Loops. Use recursive CTE's, instead." (Writing another article on that little fallacy as we speak).

    "Never use Cursors or While Loops."

    "To be SET BASED, always write your code as a single query."

    "Never use Temp Tables. Use CTEs, Derived Tables, and Views, instead."

    "Never use xp_CmdShell. It's a security risk." (one of my favorites)

    "Always write portable/ANSI/ISO code." (Close cousin of "Never use proprietary features because they destroy portability." (my second favorite)

    "Avoid stored procedures. They're not portable."

    "Never put business logic in a table or anywhere else in SQL. It doesn't belong there."

    "Never us Foreign Keys. They slow down development and it's a form of business logic that doesn't belong in a database."

    "Always put the PK/Clustered Index on an IDENTITY column."

    "Never put the PK/Clustered Index on an IDENTITY column."

    "Never use IDENTITY columns."

    "Never use GUIDs."

    "It's ok to use sequential GUIDs for a Clustered Index."

    "It's ok to write code using <insert some slow method here> because it will only be used for a small number of rows."

    "It must be true because it's written in this book by ..."

    "Save time. Use better hardware."

    "Upgrading your hardware will make your code run faster."

    "Spindles don't matter on a SAN".

    "Disk level fragmentation doesn't matter on a SAN".

    "Put your database on SSDs. All of your queries will run much faster."

    "Never do any type of formatting in T-SQL. "

    "Never do ETL in T-SQL. It wasn't built for that. Use SSIS (or some other tool) instead."

    "Use PowerShell to manage and execute all of your backups on all of your servers."

    "If you can't figure something out, it can't be done using SET BASED code."

    "Never do direct date/time math."

    "Never use partitioned views on the Enterprise Edition."

    "All triggers are evil. Avoid them at all costs."

    "You should always design the clustered index based on how most queries will use it."

    "Partitioning improves query performance."

    "You shouldn't use UPDATE for joined updates. Beginners might do it wrong."

    "You shouldn't use "0" as a datetime. Beginners might not know what it means."

    The list goes on but here's my all time favorite...

    "Just because you can do something in T-SQL, doesn't mean you should."

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/1/2016)


    Heh... without going into too many details, I've found that one of the most common query blunders is the nearly religious belief in rote, so called "best practices" and some pretty ridiculous recommendations that are frequently recommended as if they were. Some examples...

    "Never use Cursors. Use Temp Tables and While Loops, instead."

    "Never use While Loops. Use recursive CTE's, instead." (Writing another article on that little fallacy as we speak).

    "Never use Cursors or While Loops."

    "To be SET BASED, always write your code as a single query."

    "Never use Temp Tables. Use CTEs, Derived Tables, and Views, instead."

    "Never use xp_CmdShell. It's a security risk." (one of my favorites)

    "Always write portable/ANSI/ISO code." (Close cousin of "Never use proprietary features because they destroy portability." (my second favorite)

    "Avoid stored procedures. They're not portable."

    "Never put business logic in a table or anywhere else in SQL. It doesn't belong there."

    "Never us Foreign Keys. They slow down development and it's a form of business logic that doesn't belong in a database."

    "Always put the PK/Clustered Index on an IDENTITY column."

    "Never put the PK/Clustered Index on an IDENTITY column."

    "Never use IDENTITY columns."

    "Never use GUIDs."

    "It's ok to use sequential GUIDs for a Clustered Index."

    "It's ok to write code using <insert some slow method here> because it will only be used for a small number of rows."

    "It must be true because it's written in this book by ..."

    "Save time. Use better hardware."

    "Upgrading your hardware will make your code run faster."

    "Spindles don't matter on a SAN".

    "Disk level fragmentation doesn't matter on a SAN".

    "Put your database on SSDs. All of your queries will run much faster."

    "Never do any type of formatting in T-SQL. "

    "Never do ETL in T-SQL. It wasn't built for that. Use SSIS (or some other tool) instead."

    "Use PowerShell to manage and execute all of your backups on all of your servers."

    "If you can't figure something out, it can't be done using SET BASED code."

    "Never do direct date/time math."

    "Never use partitioned views on the Enterprise Edition."

    "All triggers are evil. Avoid them at all costs."

    "You should always design the clustered index based on how most queries will use it."

    "Partitioning improves query performance."

    "You shouldn't use UPDATE for joined updates. Beginners might do it wrong."

    "You shouldn't use "0" as a datetime. Beginners might not know what it means."

    The list goes on but here's my all time favorite...

    "Just because you can do something in T-SQL, doesn't mean you should."

    So many of these contain "always", "never" and "can't". My ears perk up any time I see those and there's usually an exception. "It depends" comes up so much because it's so very true.

  • Jeff Moden (1/1/2016)


    Heh... without going into too many details, I've found that one of the most common query blunders is the nearly religious belief in rote, so called "best practices" and some pretty ridiculous recommendations that are frequently recommended as if they were. Some examples...

    A personal favorite "It says so in BOL"

    😎

  • Eirikur Eiriksson (1/2/2016)


    Jeff Moden (1/1/2016)


    Heh... without going into too many details, I've found that one of the most common query blunders is the nearly religious belief in rote, so called "best practices" and some pretty ridiculous recommendations that are frequently recommended as if they were. Some examples...

    A personal favorite "It says so in BOL"

    😎

    Especially when it doesn't really say so. 😉

  • Hi Jason,

    One of the things I see regularly (and used to do myself until I realised how expensive it was) is the use of non-deterministic functions in a query. The most obvious of these is GetDate(). It might be quick to run, but if it only takes 1 ms but you are retrieving 50K rows, you have added 50 seconds to your query. Most of the time people only want the date part anyway, so unless you are running something regularly at 23:58 why query it for every row returned. Store it once in a variable, at the top of the procedure and use that instead. As there are normally a few functions wrapping the GetDate() call, you usually increase your time saving.

    The other big problem I see is people selecting more columns than they need, usually with a SELECT *, and then throwing away most of the data later. Small data sets are quicker. This is usually caused by the developer not really understanding the data or the question, and then not going back and tidying up when they have a satisfactory answer.

  • Alex Gay (1/4/2016)


    The most obvious of these is GetDate(). It might be quick to run, but if it only takes 1 ms but you are retrieving 50K rows, you have added 50 seconds to your query. Most of the time people only want the date part anyway, so unless you are running something regularly at 23:58 why query it for every row returned. Store it once in a variable, at the top of the procedure and use that instead.

    Most system functions, even getdate() are evaluated once and the value used in all rows. The exception to that is NEWID(), which is evaluated on all rows.

    You can see this by using GETDATE() or its datetime2 version SYSDATETIME() in a large resultset. The times on each row will be identical. If it was evaluated on every row, rows processed later would have later timestamps to rows processed earlier

    To test: (I used AdventureWorks because it's handy. SalesOrderDetail has 1.9 million rows in my copy of AW)

    DECLARE @t TABLE (SomeID INT, SomeDate DATETIME2)

    INSERT INTO @t (SomeID, SomeDate)

    SELECT sod.SalesOrderID, SYSDATETIME()

    FROM Sales.SalesOrderDetail AS sod;

    GO 10

    DECLARE @t TABLE (SomeID INT, SomeDate DATETIME2)

    DECLARE @SomeDate DATETIME2 = SYSDATETIME();

    INSERT INTO @t (SomeID, SomeDate)

    SELECT sod.SalesOrderID, @SomeDate

    FROM Sales.SalesOrderDetail AS sod;

    GO 10

    The inserts into the table variable are to remove any effect from transmitting and displaying the data (which SSMS is really slow at)

    CPU and duration as caught by an Extended Events session

    When averaged...

    Function in the query

    CPU: 519.5ms

    Duration 523.6 ms

    Function assigned to variable, variable used in query

    CPU: 525.6

    Duration: 523.9

    User-defined scalar functions are another matter. They really are evaluated on every row and they can have a huge effect. On 600 000 rows I saw a difference between 300ms and 3 seconds.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 46 through 60 (of 73 total)

You must be logged in to reply to this topic. Login to reply