Some questions about using partition table with multiple disk drives

  • let's assume that I have scenario as following:

    - Table tbl1 for customer 1, table tbl2 for customer 2.

    - tbl1 has one relative table named tbl1_detail. Same for tbl2.

    - I partition tbl1 and tbl1_detail into 4 months. Same for tbl2.

    - Clustered index and nonclustered index have been set.

    - I use 4 disk drives. Each drive stores 1 partition.

                      drive 1drive 2drive 3drive 4

    tbl1             m1     m2     m3    m4

    tbl1_detail  m1     m2     m3    m4

    tbl2             m1     m2     m3    m4

    tbl2_detail  m1     m2     m3    m4

    with 2 tables tbl1 and tbl1_detail, if I query them with INNER JOIN, it just query on only 1 driver1, right? but I want other drives need to be used, it meant that different month will be in different disk drive, so, I change a little bit as follow:

                      drive 1drive 2drive 3drive 4

    tbl1             m1     m2     m3    m4

    tbl1_detail  m2     m3     m4    m1

    tbl2   m2     m3     m4    m1

    tbl2_detail  m1     m2     m3    m4

    so, my questions here are:

    1. with tb1, when I query m1 (month 1) and m2 (month 2), how the SQL work with drive 1 and 2?

    2. with query tbl1 inner join tbl1_detail in month 1, will the SQL work in parallel by querying data in drive 1 and drive 4 in the same time? OR SQL will query data at drive 1 first, and query data at drive 4 then?

    3. with query tbl1 and tbl2 by UNION ALL for month 1, will the SQL work in parallel at drive 1 and drive 4? OR SQL will query data at drive 1 first, and query data at drive 4 then?

    4. What I want here is how to improve querying data, show result faster by using all available disk drives. Do you have any idea about this? do we have any algorythm to set up partition on different disk drive?

    Thank a lot.

  • No.

    Partitioning is not done for performance.

    It is done for manageability.

    if drive1 dies in your first example, you could still access m2, m3 and m4 while restoring m1 (partial db availability with piecemeal restore in enterprise ed)

    Sql server only issues I/O's when the data isn't already cached in the buffer pool. If you have enough ram, the disks won't matter for read performance.

    Sql is really, really good about caching data. And doing queries in parallel. And doing readahead reads. And doing lots of other stuff to the get the best performance.possible on hw from $1k laptops to $300k servers.

    I would not partition. Instead, I would take the individual drives and create 1 big RAID drive. The advantage is that every query that reads from disk has the chance leverage multiple drives. More physical spindles = better.

  • SpringTownDBA (11/21/2012)


    No.

    Partitioning is not done for performance.

    It is done for manageability.

    if drive1 dies in your first example, you could still access m2, m3 and m4 while restoring m1 (partial db availability with piecemeal restore in enterprise ed)

    Sql server only issues I/O's when the data isn't already cached in the buffer pool. If you have enough ram, the disks won't matter for read performance.

    Sql is really, really good about caching data. And doing queries in parallel. And doing readahead reads. And doing lots of other stuff to the get the best performance.possible on hw from $1k laptops to $300k servers.

    I would not partition. Instead, I would take the individual drives and create 1 big RAID drive. The advantage is that every query that reads from disk has the chance leverage multiple drives. More physical spindles = better.

    yes, thank you for your reply.

    but, could you please answer the second question? Actually, I don't understand what will SQL do? thanks

  • Almost completely agree with the previous poster, Partitioning the database into seperate file groups is really only done on Very Large DB's, as it can offer some performance gains.

    I'm working on a project at the moment to look at the impact of partioning a database across disks, on a data wrehouse that is 2 TB in size, with fact tables of around 300gb, and thats mainly to try and reduce the disk IO contention when doing an ETL and DW processing.

    However, its also happening in conjunction with a full Code and indexing strategy review, with DB Paritioning being used to try and get every last drop of performance out of the server that is possible.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • lovinginmyheart (11/20/2012)


    2. with query tbl1 inner join tbl1_detail in month 1, will the SQL work in parallel by querying data in drive 1 and drive 4 in the same time? OR SQL will query data at drive 1 first, and query data at drive 4 then?

    Assuming that none of the data is in cache, SQL will issue async IO requests for all the data and then let the OS sort out the details. Exactly the same as if they were on the same drive.

    Partitioning can give you a performance benefit, but it's not automatic, partition and get a huge gain. It requires careful planning, possibly code changes, lots of testing. Partitioning across multiple disks gives an advantage when the partitions will be accessed together, the data is not in cache and there's too much IO for a single array/LUN to handle.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply