What's the best practices for storing Very Large Tables?

  • Hi,

    We are in the middle of re-designing few tables (namely transaction tables) that would store very large data and would be hosted on cloud (Azure). The old design of this product breaks transaction tables into monthly tables. i.e. say ORDERS Table would be physically broke into twelve monthly tables over a year like ORDERS0115 (mmyy), ORDERS0215 and so on.

    We are in the opinion that keeping the entire transactions in one Table is better. Would like to know what's the best practices for transaction tables like the one mentioned above? Is it better to use one table with partitions. I read somewhere that partitions can slow down SELECT queries if not designed and thought properly.

    Since this would be hosted on cloud (Azure), do you think some additional things are to be taken care? How a site like Amazon keeps their transactions tables?

    Please advice.

    Thanks in advance

    -Rahul

  • You can't restore a table. You can restore a filegroup, so if you put the table into it's own filegroup, that can work. But otherwise, you have to restore the entire database to a new copy and then move the table that you want into your old database.

    As for partitioning, yes, it's made for data management, not query tuning. Queries can work pretty fast if you're always able to filter by the partition id, in this case, the month of the year. But, as soon as you can't do that, you're scanning all the partition, so yeah, performance can be very poor.

    The real question is, how many transactions a minute are you talking about? If it's extremely high, you might want to look at using an in-memory table for the data collection and then loading the data to normal tables for reporting purposes.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Rahuul (1/24/2015)


    Hi,

    We are in the middle of re-designing few tables (namely transaction tables) that would store very large data and would be hosted on cloud (Azure). The old design of this product breaks transaction tables into monthly tables. i.e. say ORDERS Table would be physically broke into twelve monthly tables over a year like ORDERS0115 (mmyy), ORDERS0215 and so on.

    We are in the opinion that keeping the entire transactions in one Table is better. Would like to know what's the best practices for transaction tables like the one mentioned above? Is it better to use one table with partitions. I read somewhere that partitions can slow down SELECT queries if not designed and thought properly.

    Since this would be hosted on cloud (Azure), do you think some additional things are to be taken care? How a site like Amazon keeps their transactions tables?

    Please advice.

    Thanks in advance

    -Rahul

    If you have the Standard Edition of SQL Server, then it may be that all those individual tables are actually part of a partitioning method known as Partitioned Views and it works quite well. If they're not actually a part of a Partitioned View, it would take very little effort to make it so because having the individual tables is the first and most difficult step to converting monolithic tables to Partitioned Views. You're also correct that partitioning is NOT a performance Panacea and that, depending on the criteria used or not used for queries, can actually slow down queries a bit although it won't cripple them. The reasons for partitioning are hardly ever done for the performance of queries. Rather, they are to greatly reduce maintenance times for indexes, greatly reduce the size of backups when old data isn't updated and, as Grant stated, make if possible to do online piecemeal restores when properly planned out.

    You also need to come up with an archive plan to remove data that isn't used online anymore but that you still have to keep somewhere (usually on tape or a detachable drive) where it could be restored if a legal requirement for the data ever reared it's ugly head.

    You also state that these transaction tables that you're talking about would "store very large data". What is your definition of "very large data"? We need to know that so that we can better advise you on some of the questions that you currently have and will have in the near future.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Grant and Jeff.

    This is an ERP system. In its current avatar, the ERP is installed in local units with their individual databases. We are planning to take the entire ERP to a central database on cloud (Azure). There are about 10k units currently. We expect it to double in 3 years down the line after our ERP becomes cloud based. On an average we receive 5GB (inserts) daily from all these units combined together. This would gradually become around 10-12gb per day in 3 years.

    Therefore, my question was, is it okay to keep such large data into One table with partioned views OR should we divide this into multiple tables. As you could understand, no customers would like to take a performance hit on INSERTS or SELECTS.

    Thanks,

    Rahul

  • I've never worked with Azure but, according to the following article...

    https://msdn.microsoft.com/en-us/library/azure/hh508997.aspx

    ... it looks like you'll need to recombine the individual tables to take advantage of partitioning and you should absolutely take advantage.

    I don't know what the max spec are for Azure but assume from the information in the article linked above that it can easily handle what you're getting ready to throw at it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff.

    The article was informative.

    Per your advice, we'll go ahead and combine these tables with Partition Views.

    Thanks a lot,

    Rahul

  • Rahuul (1/26/2015)


    Thanks Jeff.

    The article was informative.

    Per your advice, we'll go ahead and combine these tables with Partition Views.

    Thanks a lot,

    Rahul

    Not quite what I recommended for Azure, if that's where you going. From the article I provided the link to, just combine the tables into one big table and then do the partitioning on Azure. I don't believe its called "Partition Views" there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff. I got it.

    On another note for the sake curiosity,

    What would be your recommendation had it been a Standard OR Enterprise version of SQL Server (not Azure)?

    Single Large table with multiple filegroups for better maintenance and a good archival process?

    Thanks,

    Rahul

  • Rahuul (1/26/2015)


    Thanks Jeff. I got it.

    On another note for the sake curiosity,

    What would be your recommendation had it been a Standard OR Enterprise version of SQL Server (not Azure)?

    Single Large table with multiple filegroups for better maintenance and a good archival process?

    Thanks,

    Rahul

    Correct. Partitioned View with one month per filegroup/file for either version or Partitioned Table with one month per partition/filegroup/file. Both have advantages and disadvantages. If you can make older partitions "READ -ONLY) without sealing up a bunch of free space, then even better. I've got a 340GB telephone log at work that I have to keep forever. It's got a little over 5 years of data in it. We're backing up to iSCSI NAS (to get it away from the databases... not good to store backups in the same place as the databases) and so it's a bit slow. It also seem stupid to me to backup 340 GB of data that will never change every night. It was taking 6-10 hours to back it all up. Now, I have a permanent set of the previous months already backed up and I only have to backup the current month, which only takes minutes. It also made index maintenance a whole lot shorter because I only have to defrag the current month NCI's. The CI is ever increasing and it never gets fragged unless there was a proper (and that's a very rare occurrence now).

    The partitioning would make the archival process a milli-second breeze but I'm not allowed to archive the data. I have to keep all of it online all the time.

    As a bit of a sidebar, partitioning will allow you to do online piecemeal restores so that's a bonus if any corruption occurs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply