Partition and Index Alignment

  • Morning All,

    Regarding a partitioned table -- does it ever make sense to have the index partitioned on something else, or is it considered a best practice to align it.

    I ask for the following reason:

    I have a table, it has a few million rows in it, nothing spectacular -- but there is a blob in there. The table itself is 500GB.

    EDIT: The table is 2.3TB not 500GB. I am not sure how I wrote that.

    It has a legacy app pointing at it which we cannot change. As usual.

    Currently the table is not partitioned but it is heavily indexed.

    At some point in the middle distance is the idea to archive some old data out but that isn't here yet.

    The table is most often queried on UserID and CategoryID (both int/bigint).

    The table contains two date fields CreateDate an UpdateDate -- as far as I can see UpdateDate is mostly null and I'll ignore it for now as its RARELY gets updated. In fact out of the few million rows only 2 have ever been updated 6 years ago.

    Part of me wants to partition on CreateDate because it makes Archiving data out much easier. I can just remove the partition pertaining to 2011.

    However, another part of me thinks partitioning on CategoryID (there are a finite number, 143 to be exact, these arn't likely to be added to anytime soon and if so, at most 2 rows). Would be better for performance. And performance is the main point here.

    Is it possible to partition an index so differently to the table? Is it wise?

    Does anyone see a problem with my thoughts?

    Cheers All,

    Alex

    Given that does it make sense to Partition on CreateDate

  • Will you ever need to copy a "part" of the table to a Dev or Test box?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Hmm, no. I don't see a partial/partition copy to a different server a requirement.

    I did consider being able to swap a partition out to an archive database but I understand (if correct?) I would forfeit that ability if the data and index are partitioned differently and that's a trade off I can live with.

    Cheers

    Alex

  • Apologies for playing "20 questions" with you but why is it that you want to partition the table? If it's for code performance, save yourself the trouble because partitioning by itself does not increase performance. If you add the right indexes to support the right kind of partitioning, you will see some but you would also see an even larger increase in performance if you correctly indexed a monolithic table and had code that could actually use the index.

    Also, archiving sections of a monolithic table are a bit more intensive that archiving sections of a partitioned table but you also have to consider the PITA that partitioning actually is and the fact that it will slow down most code even when so called "partition elimination" occurs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh. Very interesting. You are changing my mind.

    There are a couple of reasons for considering partitioning, Jeff - as follows:

    1. Performance whilst not the focus it's also not lightning quick either.

    2. Ease of moving one partition into an archive database monthly.

    May I ask, you say that there isn't much performance gain from partitioning data, whilst I don't doubt that you are right I thought that was partly the reason for such a thing to exist. Or am I drinking too much of the marketing Kool-Aid?

    If my Kool-Aid consumption is too high what is the ideal or intended scenario for partitioning?

    Cheers

    Alex

  • To add more detail:

    There are two main queries that will be run against -- lets call it the reporting query (its not an actual report per se, as its used by the front end to display users documents) -- these rows are identified by UserID and DocumentID, simplified this is:

    select Cols from table where UserID=@UserID and DocumentID=@DocumentID

    There is a date field on this table, which is ONLY used for maintenance purposes:

    delete table where CreateDate > @DateSpecified

    So on the one hand I thought partitioning on DocumentID would provide partition elimination and increase performance.

    However on the other hand, partitioning on a date key would let me swap partition in and out but that is a secondary requirement to performance. There is plenty of overnight to move data around in batches.

    Cheers

    Alex

  • I have edited the original post for completeness, but to point out in case anyone doesn't re-read the original thread that table in question is actually 2.3TB not 500GB like I initially stated.

  • Few questions.

    As you are considering partitioning I assume you are on Enterprise version.

    When you say a few million is that 10, 50, 100, 500, 1000?

    Is the table modified by that legacy application or by another process (assuming it is still being updated).

    Even though you said it is a legacy application would you still be able to do minor changes to it or is it completely blocked for any changes - reasoning for this would be to eventually store the blob on a different table.

    Is the access to this table primarily done by the application or by external processes. and if both which ones are slow in accessing/processing the data.

    Is the access to the table always a "select *" or are the specific fields required for each "process" fully specified - and if the later how often is the blob referenced?

    and do you know what is the size of the table excluding the blob?

  • Hiya!

    Yes, Enterprise Edition, currently 2008R2 but soon to be 2016/2014 Enterprise.

    Row Count this morning is 94Million Rows

    Modification, is done by the legacy app, mostly inserts but seldom updates where it replaces a document. Just to REALLY make sure there is not audit history.

    Changes are completely blocked, we don't own the Source or even have the source code. PARTS of the app use stored procedure so as long we implement the same result set structure we can get away with changes but, of course, this part of the app uses prepared statements.

    Both SELECT * and and explicit fields are selected depending on how you access the same screen in the application. For some reason.

    Excluding blobs is relatively; tiny the average width of a row sans blob is about 100 bytes = 7GB

    Cheers!

  • alex.sqldba (8/17/2016)


    Oh. Very interesting. You are changing my mind.

    There are a couple of reasons for considering partitioning, Jeff - as follows:

    1. Performance whilst not the focus it's also not lightning quick either.

    2. Ease of moving one partition into an archive database monthly.

    May I ask, you say that there isn't much performance gain from partitioning data, whilst I don't doubt that you are right I thought that was partly the reason for such a thing to exist. Or am I drinking too much of the marketing Kool-Aid?

    If my Kool-Aid consumption is too high what is the ideal or intended scenario for partitioning?

    Cheers

    Alex

    Again, "It Depends". You say the actual table is 2.3TB instead of 500GB. How much of that will be dropped during the initial set of archives (in other words, what will the final size of the table be once you get it the way you want) and what % of the data will need to be dropped every month?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply