Time Bomb Coding

  • Perhaps someone could confirm this for me but on the subject of right-sizing datatypes I gather that shrinking fixed length datatypes after creation makes a metadata change but not a physical storage change.

    For example if you have a DECIMAL(10,2) field it will take 9 bytes of storage and can store values in the 10s of millions.

    If you shrink this down to a DECIMAL(9,2) field then it still takes 9 bytes of storage even though it would take 5 bytes had the table been created with this specification originally. The field will only allow data that complies with the DECIMAL(9,2) specification.

    I also seem to remember that BIT fields weren't part of the ANSI SQL Standard though this might have changed by now.

  • Hey this is NOT a good practice "DBCC FREEPROCCACHE" frees ALL the cache inforamtion for the SERVER. Do this on a OLTP Database and LTUR (Leave town and Update resume)

    I wish people would read and understand these "titbits" offered in forums and view them as road to further reading and research, NOT the entire answer.

  • Excellent article. Phil's article on Simple-Talk was good too.

    If I am sure the table will always be small, sometimes I will allow these tables structures just to avoid battles with the developers. When they think I am off my rocker, I will send them to these articles.

  • @Lamprey13:

    The problem existed in SQL 2005 as well. The query plan generator treats the literal 1 as an integer, and implicitly converts the in-row data to integer. This is evident by inspecting the output of the show plan.

    This bug resulted in the widespread misbelief that SQL doesn't use indexes on bits.

    I think you are the first person I've bumped into that gets this little "bit" of MSSQL trivia.

  • nicholasw (2/10/2010)


    Instead of:

    IF EXISTS(SELECT 1 FROM ......WHERE....)

    Would the following be quicker still?

    IF EXISTS(SELECT TOP 1 1 FROM ......WHERE....)

    Why not just test it to be sure or not? 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ryan C. Price (2/10/2010)


    nicholasw (2/10/2010)


    Instead of:

    IF EXISTS(SELECT 1 FROM ......WHERE....)

    Would the following be quicker still?

    IF EXISTS(SELECT TOP 1 1 FROM ......WHERE....)

    No, and it doesn't really make sense - TOP implies 'importance' i.e. TOP 1 means you're looking for the '1' most 'important' row. When using 'EXISTS', there is no concept of importance - there is data or there isn't.

    Also, doesn't TOP get lonely without an 'ORDER BY' clause ?

    No... it's the other way around and only in sub-queries/CTE's or views.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Like Remi said, "Where's the 10 Star button?" Outstanding article, David. Definitely a classic and a keeper.

    One of my personal favorite time bombs is when people write, ummmm.... "performance challenged" code because there's some sort of guarantee that there will never be more than some small number of rows in the table, some ridiculous schedule must be met, and misinformed folks believe that it's easier to write RBAR (or some other "shortcut") instead of solid, set based, scalable code. Of course, the code works great on the small number of rows in the table for the current project and everyone gets a pat on the back.

    Then, a new project starts and a requirement comes up where a developer (or someone else) says, "Hey, we did that on the last project" and quickly rounds up a copy of the code. Of course, the code is usually devoid of any reasonable documentation never mind a warning about how it was designed to handle only small sets of rows. They add the code to the new project and test. Because it's a new project and the test data is usually very small, it works a treat. It makes it through QA, they put the "Approved" stamp on it, it get's shipped to the customer, and the customer tries to use the code against a couple of million rows of data. The customer spends the next week trying to figure what went wrong, sends it back to the folks that sold it to them, they spend a week trying to figure out what's wrong, finally find the problem, spend another week rewritting the code, a week in QA, and finally getting it back out to one very ticked off customer.

    The fix for that financially and reputationally very expensive time bomb is "Never justify bad or performanced challenged code with a small rowcount... someone WILL use it for something bigger".

    For all those that claim that schedule justifies such a thing... take a look at the scenario above. It may have helped meet schedule on the first project but does the customer remember that you met schedule on the first project or do they remember that you sold them code that blew up in their faces? What do you suppose that customer's CEO is going to say to his/her CEO buddies in the elevator or on the golf course? Bad news travels fast and, many times, it's the only news that travels. If you don't think so, look at what Toyota is going through right now.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, there is a short and illuminating article on http://www.infoq.com http://www.infoq.com/vendorcontent/show.action?vcr=847 regarding technical debt in agile projects that echoes what you say.

    Basically a project team is given a fixed immutable deadline to deliver a project and the only way to deliver on time is to compromise on quality.

    Another team comes along to do the next project and because of the technical debt accrued in itteration one they work slower. Inevitably the comment is made that team two is less productive than team one and pressure is applied. Again the only way of getting the project "on track" is to take short-cuts and bodge the quality.....etc

    When done properly an agile project is a joy to work on. It is so easy to talk the talk with with agile but not walk the walk!

  • Wonderful article covering so many BOMBs ....

  • Jeff Moden (2/10/2010)


    It makes it through QA, they put the "Approved" stamp on it, it get's shipped to the customer, and the customer tries to use the code against a couple of million rows of data. The customer spends the next week trying to figure what went wrong, sends it back to the folks that sold it to them, they spend a week trying to figure out what's wrong, finally find the problem, spend another week rewritting the code, a week in QA, and finally getting it back out to one very ticked off customer.

    all too often that last bit where they re-write the code and send it back doesn't happen. You get told to wait for the next upgrade when it will be magically fixed or (I kid you not) - delete some of your data!

    Some BIG companies do this, I put it down to not having SQL coders in their employ.

    ---------------------------------------------------------------------

  • Malcolm Daughtree (2/10/2010)


    Hey this is NOT a good practice "DBCC FREEPROCCACHE" frees ALL the cache inforamtion for the SERVER. Do this on a OLTP Database and LTUR (Leave town and Update resume)

    I wish people would read and understand these "titbits" offered in forums and view them as road to further reading and research, NOT the entire answer.

    That is true. I certainly hope that anyone reading these descriptions of testing statements for performance understands that these should be done in development and staging environments and not in production environments. DBCC FREEPROCCACHE will empty all execution plans and DBCC DROPCLEANBUFFERS will empty the buffer of all table results already loaded into memory. Analyzing queries for performance metrics in any production server is not a good idea, OLTP especially. Normally I would assume that this is understood.

  • Great article! I wish I'd read this a long time ago!

    This is definitely being added to my briefcase!

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • David.Poole (2/10/2010)


    Perhaps someone could confirm this for me but on the subject of right-sizing datatypes I gather that shrinking fixed length datatypes after creation makes a metadata change but not a physical storage change.

    For example if you have a DECIMAL(10,2) field it will take 9 bytes of storage and can store values in the 10s of millions.

    If you shrink this down to a DECIMAL(9,2) field then it still takes 9 bytes of storage even though it would take 5 bytes had the table been created with this specification originally. The field will only allow data that complies with the DECIMAL(9,2) specification.

    I also seem to remember that BIT fields weren't part of the ANSI SQL Standard though this might have changed by now.

    Test it and see what happens. I haven't tried doing this with SQL 2005 or SQL 2008, but with SQL 2000 I would expect the following behavior:

    1) If any data exists that requires DECIMAL(10,2) you will get a truncation error and the operation will fail.

    2) If all data passes the first test, then the additional bytes of storage will be deallocated and the operation will take quite some time if there are many rows.

    3) De-allocated space will not shrink the mdf/ndf files, although it should increase the amount of free space in those files. Even more free space may show up after the next index rebuild.

    4) Shrinking and re-orgarnizing the the data files may reveal more free space. This is a very costly and time-consuming operation. On large tables it will take hours and make the database unavailable. It is usually better to leave it alone and allow SQL Server to write new data into the recently de-allocated space.

    Please anyone with direct experience correct me if any of these assumptions is incorrect. I'd also like to add that column schema changes should always be tested for performance in a staging environment first, should always be done during maintenance windows, and should always be performed using T-SQL statements. In SQL 2000, if you change a column definition using Enterprise Manager, SQL Server will build a new table, load all of the existing table's data into it, drop the original table, and rename the new table. I don't think I have to describe how costly an operation that is. The T-SQL statement will merely change the schema and allocate/de-allocate space to match the new definition. I have found that space de-allocations are more time-consuming. In both methods, the operation will fail in the case of data truncations and incompatible data types.

  • Jeff Gray (2/10/2010)


    I think you are the first person I've bumped into that gets this little "bit" of MSSQL trivia.

    LOL! Thanks Jeff, that little quip made my day. 🙂

  • george sibbald (2/11/2010)


    Jeff Moden (2/10/2010)


    It makes it through QA, they put the "Approved" stamp on it, it get's shipped to the customer, and the customer tries to use the code against a couple of million rows of data. The customer spends the next week trying to figure what went wrong, sends it back to the folks that sold it to them, they spend a week trying to figure out what's wrong, finally find the problem, spend another week rewritting the code, a week in QA, and finally getting it back out to one very ticked off customer.

    all too often that last bit where they re-write the code and send it back doesn't happen. You get told to wait for the next upgrade when it will be magically fixed or (I kid you not) - delete some of your data!

    Some BIG companies do this, I put it down to not having SQL coders in their employ.

    How true, how true!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 46 through 60 (of 73 total)

You must be logged in to reply to this topic. Login to reply