Never, Ever Use Clustered Indexes

  • patrickmcginnis59 (5/31/2012)


    Grant Fritchey (5/31/2012)


    patrickmcginnis59 (5/31/2012)


    Grant Fritchey (5/31/2012)

    I'm with you. Fast enough is fast enough.

    But, I remember when a friend of mine (wish I could say I did this) spotted a problematic query in one of the overnight load processes. He did some tweaks and something that took 6 hours was cut down to 3 minutes. It's the kind of win you dream about finding. And the solution was to take advantage of a particular set of functionality offered by SQL Server while the original load process had been written by an Oracle DBA (a very capable and good Oracle DBA, don't misunderstand my message) who used an approach common to both platforms.

    Tweaking to get a 5% improvement in speed.. Yeah, that's probably crazy. But that doesn't mean that any or all tweaks should be ignored.

    So what takes 6 hours in Oracle that takes 3 minutes in SQL server?

    Lordy no. Oh dear. I sure hope that was a joke.

    Its not a joke. You're posting that he cut a query from 6 hours to 3 minutes based on a Microsoft feature. Now I can accept if theres an equivalent Oracle feature that would have also cut it down to 3 minutes, and can readily accept that there are material differences in implementations, or that possibly an Oracle implementer simply lacked training to implement on SQL server. What I'm curious on is the difference between 6 hours and 3 minutes and what stopped the Oracle guy from finding out himself how to cut the time?

    For example, did the Oracle guy believe the process would also take 6 hours on an Oracle server?

    I'm sorry, again. I tried hard to be careful about the language.

    I'm not implying there was anything wrong with Oracle or the Oracle DBA. In fact, the Oracle DBA was a very capable person. Oracle is a very capable system.

    The problem was on SQL Server. SQL Server was running extremely slowly. SQL Server was processing in 6 hours. The process was written by someone who was an Oracle expert, but not a SQL Server expert. They made choices that worked in SQL Server, but they didn't work well in SQL Server. The fix was to change the query to use a function (and I don't recall exactly what it was, this was two years ago and I wasn't the hero of the story, so I just don't remember, sorry again) that worked better within SQL Server because it used functionality unique to SQL Server. It was not meant to imply a shortcoming in the Oracle person or Oracle.

    None of this was ever meant to question Oracle, Oracle DBAs, or the capabilities of either the Oracle software or the people who manage Oracle. But, just as Oracle has functions that make Oracle run faster, and those functions don't exist in SQL Server, there are similar things in SQL Server. That's all I was trying to point out. Instead, I've somehow stepped into "Oracle Sucks" and that was so very, very much not my intention. I'm sorry. I apologize.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Paul Hunter (5/31/2012)


    I hear the argument "we don't want to be locked in" from the app dev folks - the same ones that aren't afraid to be locked in to their programming language and/or OS. It's a "logic free" argument.

    +1


    Peter MaloofServing Data

  • Dean Cochrane (5/31/2012)


    ...and many people who should know better treat the relational database as nothing more than a file system with built in backup capability.

    This is, to my experience, more often the case than not.

    I've seen some amazing things in the wilds of vendor and even internal applications.

  • Grant Fritchey (5/31/2012)


    Alex-668179 (5/31/2012)


    "every table within SQL Server should have a clustered index"

    Fine, but should it then also always be the (preferably meaningless?) primary key? Or an index that is more likely to be used to retrieve ranges of records?

    Ah, but there's the rub. Honest people disagree. But there's little disagreement that there are radical differences in the methods of storage between a clustered index and a heap. In the overriding majority of the time, use a clustered index.

    Also, I wouldn't focus on that "range of records" argument either. I'm not sure where that came from originally, but it's long been considered bunk. I usually focus on making my clustered index the most frequent access path to the data in a given table. That's frequently the primary key, but not always.

    It's simple...the clustered index should be designed specifically to achieve - as close as possible - an append pattern for all inserts. (think 'identity' column - or 'create_datetime') it doesn't need to relate to any other criteria PK or otherwise - but you'll usually have something. This way you are not trying to squeeze rows into the 'middle' of the table causing page splits etc. (all sorts of mischief to maintain the 'logical' order of rows)

    Where you may divert from this 'child' tables - where the 'child' rows are grouped by the 'parent' (it's an append pattern for the children - but you need to squeeze in the parent/child sets. (think Invoice/InvoiceRow - basically an append, but will need to squeeze in an InvoiceRow if you add one later) - this benefits retrieval as all the invoiceRows - for a given invoice - will be together (and in fewer pages to read)

  • While I can somewhat sympathize with the portability concern, I feel that it's a Cost/Benefit type of situation, and in many if not most cases, it will probably cost more performance wise to implement a solution from a generic perspective than it would cost to take full advanatage of the tools available on the implementation platform and then peform an unexpected migration to another platform years later. Besides, the cost of an actual migration is usually not cheap anyway so what's a few more hours/dollars when it comes to doing it right.

    The one thing that I despise about programming web based UI's is the number of different platform nuances that have to be accounted for.

    I'm certainly not going to use a screwdriver to drive a nail or a hammer to sink a screw, so why should I shortcut my code simply because it "might" be requried to run on another platform some day?

  • I've maintained for years that databases are platform-portable in the same way that continental plates are "portable". They do move, and it's usually listed as a disaster by everyone involved (I went through the Northridge earthquake in Los Angeles; wasn't fun).

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (6/4/2012)


    I've maintained for years that databases are platform-portable in the same way that continental plates are "portable". They do move, and it's usually listed as a disaster by everyone involved (I went through the Northridge earthquake in Los Angeles; wasn't fun).

    I've concluded that most folks at a SQL Server specific forum are going to have a natural bias against portability in general.

  • rob.lobbe-964963 (5/31/2012)


    Grant Fritchey (5/31/2012)


    Alex-668179 (5/31/2012)


    "every table within SQL Server should have a clustered index"

    Fine, but should it then also always be the (preferably meaningless?) primary key? Or an index that is more likely to be used to retrieve ranges of records?

    Ah, but there's the rub. Honest people disagree. But there's little disagreement that there are radical differences in the methods of storage between a clustered index and a heap. In the overriding majority of the time, use a clustered index.

    Also, I wouldn't focus on that "range of records" argument either. I'm not sure where that came from originally, but it's long been considered bunk. I usually focus on making my clustered index the most frequent access path to the data in a given table. That's frequently the primary key, but not always.

    It's simple...the clustered index should be designed specifically to achieve - as close as possible - an append pattern for all inserts. (think 'identity' column - or 'create_datetime') it doesn't need to relate to any other criteria PK or otherwise - but you'll usually have something. This way you are not trying to squeeze rows into the 'middle' of the table causing page splits etc. (all sorts of mischief to maintain the 'logical' order of rows)

    Where you may divert from this 'child' tables - where the 'child' rows are grouped by the 'parent' (it's an append pattern for the children - but you need to squeeze in the parent/child sets. (think Invoice/InvoiceRow - basically an append, but will need to squeeze in an InvoiceRow if you add one later) - this benefits retrieval as all the invoiceRows - for a given invoice - will be together (and in fewer pages to read)

    That's true if your most expensive operations are inserts. Either most frequent or high expense per insert.

    If selects have you highest cost, then you want your clustered index to support them more effectively.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • patrickmcginnis59 (6/4/2012)


    GSquared (6/4/2012)


    I've maintained for years that databases are platform-portable in the same way that continental plates are "portable". They do move, and it's usually listed as a disaster by everyone involved (I went through the Northridge earthquake in Los Angeles; wasn't fun).

    I've concluded that most folks at a SQL Server specific forum are going to have a natural bias against portability in general.

    Thank you for that. It's highly insulting.

    You don't know me. You know nothing about me. But you feel free to make that kind of statement about me?

    Feel free to rationalize your own biases by accusing others of ones they don't actually have. If that's what it takes to shore up your ego, that's what it takes.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • patrickmcginnis59 (6/4/2012)


    GSquared (6/4/2012)


    I've maintained for years that databases are platform-portable in the same way that continental plates are "portable". They do move, and it's usually listed as a disaster by everyone involved (I went through the Northridge earthquake in Los Angeles; wasn't fun).

    I've concluded that most folks at a SQL Server specific forum are going to have a natural bias against portability in general.

    Since I've worked with Oracle as well, I have to say that it's just a natural bias against not using all of the tools available in an RDBMS for the sake of something that rarely happens. For simple "CRUD" code... sure... be portable. For serious batch code, don't even think about portability because no matter what you do, it's not going to be 100% portable anyway.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/19/2012)


    patrickmcginnis59 (6/4/2012)


    GSquared (6/4/2012)


    I've maintained for years that databases are platform-portable in the same way that continental plates are "portable". They do move, and it's usually listed as a disaster by everyone involved (I went through the Northridge earthquake in Los Angeles; wasn't fun).

    I've concluded that most folks at a SQL Server specific forum are going to have a natural bias against portability in general.

    Since I've worked with Oracle as well, I have to say that it's just a natural bias against not using all of the tools available in an RDBMS for the sake of something that rarely happens. For simple "CRUD" code... sure... be portable. For serious batch code, don't even think about portability because no matter what you do, it's not going to be 100% portable anyway.

    I don't really have strong feelings either way, but I try to stay informed on whether SQL statements I use are specific to MSSQL and I also hope to be doing some Postgresql stuff in the future so I try to do some comparisons here and there to see what I'm in for. I also see that some vendors do want some independence from specific databases and I can understand why.

  • Just received an interesting comment on this over on my blog that included this:

    “By now, you might be able to see where I’m going in this section. I have made references above to other

    databases and how features are implemented differently in each. With the exception of some read-only

    applications, it is my contention that building a wholly database-independent application that is highly

    scalable is extremely hard — it is, in fact, quite impossible unless you know exactly how each database

    works in great detail. And, if you knew how each database worked in great detail, you’d understand that

    database independence is not something you really want to achieve (a very circular argument!).”

    Tom Kyte -Expert Oracle Database Architecture Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions

    So, evidently, it's not just SQL Server people who have a "natural bias" against portability.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (6/21/2012)


    Just received an interesting comment on this over on my blog that included this:

    “By now, you might be able to see where I’m going in this section. I have made references above to other

    databases and how features are implemented differently in each. With the exception of some read-only

    applications, it is my contention that building a wholly database-independent application that is highly

    scalable is extremely hard — it is, in fact, quite impossible unless you know exactly how each database

    works in great detail. And, if you knew how each database worked in great detail, you’d understand that

    database independence is not something you really want to achieve (a very circular argument!).”

    Tom Kyte -Expert Oracle Database Architecture Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions

    So, evidently, it's not just SQL Server people who have a "natural bias" against portability.

    Can't disagree with that. Folks who have high expertise in particular database systems are bound to have that bias. Application designers on the other hand probably have a natural bias toward their application platforms and for whatever reason, a subset of them want to run their apps against different databases. For example, some parts like form interfaces are fairly generic across databases, whereas other parts of the application are probably more specific toward which database they hook up to. The more common case I see is that the vendors like portability, but their customers rarely move between databases. Where I work we're partial to one database and tend to shop for products that work on it.

    Also I see you put "natural bias" in quotes. Maybe the word "bias" itself has a bad connotation? How much do you like working with Oracle for instance, and if you don't, what would you like your preferences called? At least one other poster sounded like he felt I was trash talking directly at him. And I do realise that people here may not react positivelly about contrary opinions, and I'm willing to accept that I don't have to voice contrary viewpoints here, it wouldn't be a huge loss to me, as these discussions haven't ended up being that productive here anyways.

  • patrickmcginnis59 (6/21/2012)


    Grant Fritchey (6/21/2012)


    Just received an interesting comment on this over on my blog that included this:

    “By now, you might be able to see where I’m going in this section. I have made references above to other

    databases and how features are implemented differently in each. With the exception of some read-only

    applications, it is my contention that building a wholly database-independent application that is highly

    scalable is extremely hard — it is, in fact, quite impossible unless you know exactly how each database

    works in great detail. And, if you knew how each database worked in great detail, you’d understand that

    database independence is not something you really want to achieve (a very circular argument!).”

    Tom Kyte -Expert Oracle Database Architecture Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions

    So, evidently, it's not just SQL Server people who have a "natural bias" against portability.

    Can't disagree with that. Folks who have high expertise in particular database systems are bound to have that bias. Application designers on the other hand probably have a natural bias toward their application platforms and for whatever reason, a subset of them want to run their apps against different databases. For example, some parts like form interfaces are fairly generic across databases, whereas other parts of the application are probably more specific toward which database they hook up to. The more common case I see is that the vendors like portability, but their customers rarely move between databases. Where I work we're partial to one database and tend to shop for products that work on it.

    Also I see you put "natural bias" in quotes. Maybe the word "bias" itself has a bad connotation? How much do you like working with Oracle for instance, and if you don't, what would you like your preferences called? At least one other poster sounded like he felt I was trash talking directly at him. And I do realise that people here may not react positivelly about contrary opinions, and I'm willing to accept that I don't have to voice contrary viewpoints here, it wouldn't be a huge loss to me, as these discussions haven't ended up being that productive here anyways.

    I have nothing against Oracle at all, as I've said, over & over again in this thread as well as elsewhere. Oracle is great. Just so happens, I don't work in it & it doesn't pay my bills. But, I'd be perfectly happy to work with it, DB2, MySQL, Hadoop, MongoDB or whatever, if that's where I needed to go to keep a roof over the family's heads. My issue isn't platform, it's design. And most people around here don't react badly to contrary opinions. Heck, that's what makes things fun. It's the generally ad hominem & dismissive manner in which they're delivered that's causing hackles to go up. Disagree without being disagreeable.

    I'm back to my point though. There is a very real, and evidently apparent in more than just the SQL sphere, understanding that you can design for performance on a platform, or you can design for portability between platforms, but you can't really design for both (without forking your code, readily acknowledged there, but that reinforces the point, not negates it).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (6/21/2012)


    There is a very real, and evidently apparent in more than just the SQL sphere, understanding that you can design for performance on a platform, or you can design for portability between platforms, but you can't really design for both (without forking your code, readily acknowledged there, but that reinforces the point, not negates it).

    And if you can provide an example of a large DB designed for portability that performs well on multiple platforms I would love to see it. Because if there's a way to design for portability and have it perform well there's no reason not to do that.

Viewing 15 posts - 46 through 60 (of 64 total)

You must be logged in to reply to this topic. Login to reply