Cluster design comment wanted.

  • Hello Forummembers,

    For all of you a Happy Newyear.

    I have lost a bit my nightrest over the following:

    First we do not have a problem, the problem is in my head only. But I would like to 'discuss' this or resive your comments.

    A design pattern we use is:

    Tables

    Client clustered on Clientid (identity) Clientid.

    ClientOrder clustered on Clientid, Orderid (identity).

    ClientLine clustered on Clientid, LineId.

    (With constraits from the order to the client and the line to the order).

    This works very well in our surrounding, most actions are done on clients, reading data for a client is fast, writing data for a client is at least fast enough. We have been using this pattern for a long time.

    So what is my problem. When inserts a done for a client, the orders and lines are done with ever increasing identity values. So when a page fills up for a client, the page splits with the 'low' identities on the left page and the high identities on the right page. The left page stays as it is because no data is added to that page (if this is the only client on that page). The right page gets extra data till it splits again. So if there a clients with multiple pages of data, all the pages for that client except the first and the last page are only filled for 50 percent.

    This is bothering me a littlebit.

    In reality this effect does occure, have not tested yet how much this occures. And a rebuild would solve the problem.

    So we do not have a real problem, but it still bothers me.

    (Is this a wrong 'design'? Haven't I thought this through enough? Shouldn't this bother me?)

    Happy newyear,

    Ben Brugman

  • The clustered indexes you chose for ClientOrder and ClientLine are not ever-increasing so you will incur page splits as clients add orders to existing pages, which is not going to be ideal in many cases depending on the workload. Choosing a clustering key is fundamental if you hope to produce a well-performing database that is maintainable. Please read this article and all linked articles:

    http://www.sqlskills.com/blogs/kimberly/more-considerations-for-the-clustering-key-the-clustered-index-debate-continues

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (1/3/2013)


    The clustered indexes you chose for ClientOrder and ClientLine are not ever-increasing

    I am sorry I was not clear with my description. I meant that the indexes are ever increasing for every single client.

    so you will incur page splits as clients add orders to existing pages

    Yes this will incur page splits. But this is part of the design, this keeps data of a single client in 'one' location in the table, making reading, modifying the data of a single client very efficient. Also joins for a single client over the tables is very efficient because the tables have a very similar organisation.

    So the page splits are not a problem.

    The database is maintainable and wel performing. Each row is read far more often than it is written. Grouping the data based on the client makes for efficient reading when the client is part of the selection.

    , which is not going to be ideal in many cases depending on the workload. Choosing a clustering key is fundamental if you hope to produce a well-performing database that is maintainable. Please read this article and all linked articles:

    http://www.sqlskills.com/blogs/kimberly/more-considerations-for-the-clustering-key-the-clustered-index-debate-continues

    I am aware of the these articles and some more articles which can be found on the web.

    The design was that the tables are optimised for reading and fast enough for writing. (Which they are).

    My problem is that for large clients which have data over several pages, all pages except the first and the last page will only be filled for 50 percent.

    Rebuilding or reorganising will take care of that, so it is not a huge problem. But my original assumption was because of the 'spread' of inserts that each page would be filled for 75 percent on average. (The amount of a B-tree that gets filled when using a random will).

    So the clustered key is an efficient key for the workload. But it is not the best key to get the pages filled.

    I did test on different keys for the clustering, using the client as the first field in the clustered key was by far the best solution. (I even added the client_id to a number of tables because off the gain reading speed, without, writing speed did gain as wel because of the reduced number of indexes needed).

    Any thoughts?

    Any solutions to get a higher fill rate for larger clients?

    Or is this a non issue?

    Thanks for your time and attention,

    Ben Brugman

  • Please post the table definitions including all indexes and constraints.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (1/3/2013)


    Please post the table definitions including all indexes and constraints.

    I have been working on the table definitions.

    I do not wish to post the complete definition,

    1 because of the volume,

    2 because this is intelectual property of my organisation.

    I have now reduced the volume to 350 lines, but still have to do some work to alter some of the names of tables and of fields.

    I do not mind putting this work into this and posting the altered script.

    (Limited number of tables, same construction but altered names for tables and fields).

    But will that help to anwser the question ?

    Ben

  • I am curious about the width of the tables (which you sort of answered), the fill factor of the clustered indexes, the other indexes (definitions and fill factors and such) as well as constraints but we can have a go without them.

    ben.brugman (1/3/2013)


    opc.three (1/3/2013)


    The clustered indexes you chose for ClientOrder and ClientLine are not ever-increasing

    I am sorry I was not clear with my description. I meant that the indexes are ever increasing for every single client.

    I got that part, but I meant overall they are not ever-increasing but I think you also have a firm grip on that topic too.

    The database is maintainable and wel performing. Each row is read far more often than it is written. Grouping the data based on the client makes for efficient reading when the client is part of the selection.

    Then you probably have no worries.

    My problem is that for large clients which have data over several pages, all pages except the first and the last page will only be filled for 50 percent.

    Rebuilding or reorganising will take care of that, so it is not a huge problem. But my original assumption was because of the 'spread' of inserts that each page would be filled for 75 percent on average. (The amount of a B-tree that gets filled when using a random will).

    So the clustered key is an efficient key for the workload. But it is not the best key to get the pages filled.

    I wouldn't worry too much about it. While all a client's rows may be logically located together, after these page splits they'll be physically dispersed on disk and in less-than-full pages so you're liable to take a hit on performance doing random I/Os when working with that client, but it sounds like you're system is handling the results of an approach like this. What percent of fragmentation are you seeing in your clustered indexes after one day of normal business? Does the page density change at all? I think if your reads are fast, and you're primarily doing reads, and you keep up with your index maintenance you'll be fine. What are you using to do index maintenance and how often do you run it? How often are the clustered indexes rebuilt?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I wouldn't worry too much about it. While all a client's rows may be logically located together, after these page splits they'll be physically dispersed on disk and in less-than-full pages so you're liable to take a hit on performance doing random I/Os when working with that client, but it sounds like you're system is handling the results of an approach like this. What percent of fragmentation are you seeing in your clustered indexes after one day of normal business? Does the page density change at all? I think if your reads are fast, and you're primarily doing reads, and you keep up with your index maintenance you'll be fine. What are you using to do index maintenance and how often do you run it? How often are the clustered indexes rebuilt?

    I'll have to look into this. We have several customers for 'this' database. Some customers will do pre-active maintenance, others probably do not.

    So I'll have to look into this. It is very possible that there are databases still on SQL-2000 where indexes (and clustered tables) have never been rebuild and that have been running over 10 years now.

    At the time of design (SQL-7) and with the cheaper editions of SQL-server the possibilities for maintenance were limited. Also the our knowledge about maintenance was limited at the design time (10 years ago). We did have experience with and trust in B-tree's, so our choices were based on that.

    But when I get the oppertunity I will have a look in the databases running at customers sites. Probably do some DBCC checks and maybe some other checks. At the moment I do not have access to the databases at customers sites.

    (I could add some maintenance scripts with the next update script, update scripts are run by the customer, so normally I do not get results back, except when something goes wrong, but that happens only very rarely:-))

    Thanks for your tips and pointers,

    Ben Brugman

  • If you're looking at implementing index maintenance have a look at this set of scripts:

    Ola Hallengren > SQL Server Index and Statistics Maintenance

    It is done in T-SQL so you can push the changes in your scripts. The first time you run it, if it finds a lot of fragmentation it may bloat your transaction logs so be aware of that possibility. Once you are on top of the index maintenance though, chances are good you could run the process nightly. It only maintains indexes that need it per thresholds you set, instead of all indexes like the built-in SQL Server Maintenance Plan Task for index maintenance.

    It also sounds like you are not running integrity checks (DBCC CHECKDB) regularly. This is actually even more important than index maintenance. You should be running CHECKDB frequently, in alignment with your backup schedules, such that if database corruption is found you can get back to a point in time before the corruption was introduced so you do not lose data. Ola also has a script for that as well: SQL Server Integrity Check

    You might just want to check out his turnkey solution called MaintenanceSolution.sql which attempts to set everything I mentioned up for you plus database backups.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Do you have only a few clients with huge numbers of orders each?

    Or for a large number of clients, some with large numbers of orders, some with small numbers of orders?

    In the first case, your current clustering is likely fine.

    In the second case, you likely want to go with the more "standard" clustering for order systems of:

    Client clustered on Clientid (identity) Clientid.

    ClientOrder clustered on Orderid (identity).

    ClientLine clustered on the combination of ( Orderid, LineId ).

    That drastically reduces fragementation but still keeps all info for the same order together.

    You can use a nonclustered index on ClientOrder.Clientid for the relatively rare cases when you need to work with all orders for a given client.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Sorry for the late response, I'll try to anwser the questions now.

    [/quote]

    ScottPletcher (1/4/2013)


    Do you have only a few clients with huge numbers of orders each?

    Or for a large number of clients, some with large numbers of orders, some with small numbers of orders?

    A large number of clients with very unbalansed numbers of orders.

    In the first case, your current clustering is likely fine.

    In the second case, you likely want to go with the more "standard" clustering for order systems of:

    Client clustered on Clientid (identity) Clientid.

    ClientOrder clustered on Orderid (identity).

    ClientLine clustered on the combination of ( Orderid, LineId ).

    That drastically reduces fragementation but still keeps all info for the same order together.

    You can use a nonclustered index on ClientOrder.Clientid for the relatively rare cases when you need to work with all orders for a given client.

    Working with all orders is not rare, but common or even usual.

    Number of times a row is accessed is far higher than the number the number of writes this is especially so for the clients with a large number of orders, they 'visit' far more frequently.

    The system has been running very succesfully for over 10 years, with larger and smaller sites. So in reality the performance is good.

    I only recently (just before Xmas) that there was a 'design-flaw', which was that for large clients, all in between pages would be filled for 50 percent. Allthough not a problem, it was an oversight from me and that bothers me. This situation must ocure with others as well, my hope was to learn from others.

    Thanks,

    Ben

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply