Behaviour in partitioned database.

  • Hello all,

    At the moment I am trying to come to grips with partitioning.

    I am trying a number of scenario's to gain knowledge about partitioning and how to use partitioning. I did read a number of articles on the web, but getting your feet wet with actual code does help as wel.

    Queries.

    1. Selection on the partkey, (clientid,) startdttm (Est. S c. 0.0071077)

    2. Selection on the startdttm (Estimated Subtree cost 0.0183308)

    3. Selection on the clientid, startdttm (Est. S. c. 0.10641)

    For 3 far more diskreads are done :alien:

    Index on startdttm is not partitioned. (Used in 1 and 2 and 3)

    Index on clientid is partitioned. (Used in 3)

    To me this lookes like that the optimizer does not or hardly looks at the fact that some indexes are partitioned and some are not.

    Does the optimiser take into account that indexes can be partitioned?

    Sorry I am not able to post a complete example with script to script the table and the contents.

    My conclusions:

    1. Using partitioning is a riscy bussiness.

    2. The optimiser does not handle partitioning very wel.

    I can make some guesses why this is happening, but can anyone explain this behaviour to me?

    ben brugman

    dbcc dropcleanbuffers

    dbcc freeproccache

    SET STATISTICS TIME ON

    SET STATISTICS IO ON

    select * from contact where

    PartKey = 15

    and ClientID = 1000000010984

    and startdttm = '2012-03-16 07:32:00.000'

    -- Table 'contact'. Scan count 1, logical reads 4, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- CPU time = 0 ms, elapsed time = 108 ms.

    dbcc dropcleanbuffers

    dbcc freeproccache

    select * from contact where

    startdttm = '2012-03-16 07:32:00.000'

    -- Table 'contact'. Scan count 1, logical reads 4, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- CPU time = 0 ms, elapsed time = 91 ms.

    dbcc dropcleanbuffers

    dbcc freeproccache

    select * from contact where

    ClientID = 1000000010984

    and startdttm = '2012-03-16 07:32:00.000'

    --Table 'contact'. Scan count 30, logical reads 62, physical reads 61, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- CPU time = 0 ms, elapsed time = 1202 ms.

    Index information

    objectnameindexname partitions

    contact NUNCI_contact_STRTDTTM 1

    contact NUNCI_ENCTR_CLIIDENDDT 30

    contact PK_contact 30

    Index information

    2[PK_contact]nonclustered, unique, primary key located on fclientPScheme[ID], [PartKey]

    6[NUNCI_contact_CLIIDENDDT]nonclustered located on fclientPScheme[clientID], [EndDTTM]

    10[NUNCI_contact_STRTDTTM]nonclustered located on NCIndexFG[StartDTTM]

  • https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You may or may not need partitioning.

    But you absolutely need to determine the proper clustered index for the table.

    It's the column(s) you (almost) always specify in a WHERE clause, particularly if they are in a range.

    In your case, it's likely "startdttm", although I can't say for sure without more details.

    That will give you vastly better performance, partitioned or not.

    Then, if you do partition, partition on the clustering key. Partitioning does allow you to specify different compression for each partition, if you want to (i.e., you can compress older, historical data w/o being forced to compress current data, even though both are in the same table). Sadly, I don't think we can yet specify a different FILLFACTOR for each partition, which could be tremendously helpful as well.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thank you Gail for the link.

    From the link:

    'summary would be: don't change or implement anything without research and without knowing what it's going to affect. '

    My quest at the moment is introducing Clustering for a number of tables. By now I have shown that clustering will give an significant improvement for allmost all actions. (Select/Insert/Update/Delete, complex statements, cache). And no significant degradation for any action.

    But I do not have the resources to benchmark all situations. (Partitioning and thousands of users is one of them and a enormously large number of potential statements).

    But the company is 'a bit' reluctant to the change. So I am trying to gain some knowledge about partitioning, do this by reading articles (yours) and by performing some simple 'tests', hence my question. With the understanding of internals I am in a stronger position.

    Thanks for the link,

    Ben Brugman

  • ScottPletcher (1/29/2013)


    You may or may not need partitioning.

    But you absolutely need to determine the proper clustered index for the table.

    I totally agree that a clustered index will help performance in our situation. But to convince the 'decision' makers, I have to know the effect in our sitiuation, hence my questions about partitioning.

    Thx,

    ben

    (The example comes from a table with 13 indexes. And startdttm is not the most used index. Using a clustered index on another field, might give similar results.).

  • ben.brugman (1/31/2013)


    ScottPletcher (1/29/2013)


    You may or may not need partitioning.

    But you absolutely need to determine the proper clustered index for the table.

    I totally agree that a clustered index will help performance in our situation. But to convince the 'decision' makers, I have to know the effect in our sitiuation, hence my questions about partitioning.

    Thx,

    ben

    (The example comes from a table with 13 indexes. And startdttm is not the most used index. Using a clustered index on another field, might give similar results.).

    startdttm was the only column used in all three of your sample queries. If you have other queries that are more representative of the normal work load than what you've provided us with as we try to assist, then of course go by the more represenative queries.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (1/31/2013)


    startdttm was the only column used in all three of your sample queries. If you have other queries that are more representative of the normal work load than what you've provided us with as we try to assist, then of course go by the more represenative queries.

    This was a query with a suprising effect. If you compare the 2 situation with the 3 situation you can see that the query is almost the same, but that for the 3 situation an extra field is used for selection.

    compare:

    2: Select * from A where B = 1

    3: Select * from A where B = 1 and C = 2

    What did suprise me that the second query was a ten fold more expensive. Both in estimated cost as in diskreads.

    If for the situation 2 the number of diskreads is 4, for situation 3 the diskreads can always be done in the same number or less reads than in situation 2.

    But the optimizer chooses to take another path this did supprise me.

    Hence the question.

    Ben

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply