Dealing with large tables (Financial transactions)

  • Hi,

    We have in our production database a large table that is storing financial transactions, currently the table has about 90mlns of rows with 20 GB data space and 26 GB index space. Operations are getting slower, also index maintenance is getting more painful. What are the best practises with dealing with constantly growing big tables like that? I was thinking to create an archive table, schedule a task that would move old records (over a month old) into archive and create an indexed view on the archive with aggregated transactions (for some statistics that we have running). Not sure how much sense does that solution have.

    Let me know your thoughts or provide with some helpful articles, that would be greatly appreciated!

  • That can work.

    Alternately you could aggregate the stuff you remove from the current table and just store the aggregated results if you don't need the details. Other option would be partition the table, that would let you do index maintenance on the partition level (and the older partitions wouldn't be changing, so you'd just need to do it on the current partition), but it won't do much, if anything, for performance.

    If an upgrade to SQL 2016 is on the cards, look at Stretched databases.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I need the historical data for the upcoming project of implementing "cash book", also considered partitioning but i found your article, where you clearly explained that this is not a performance tuning technique and is used mostly for fast loading and unloading. I think I'll try with the archive and see how it performs on my dev environment.

  • 1) IIRC, partitioning is an Enterprise Edition only feature. That could preclude it's use right off the bat.

    2) Partitioning CAN make SOME or even MOST queries run very fast AND allow for high concurrency. There's a lot of caveats, provisos, limitations, gotchas, etc. there though.

    3) Having the same data on the same database with it's hardware, virtualization (possibly), settings, etc. will do almost nothing for performance other than your index/stats mx work.

    4) I would REALLY take a look at wait stats and IO stalls and hardware/VM/etc sizing/config and see if there aren't improvements available there. I have NEVER been to a client where even half of that stuff was optimal, and usually it is a very low percentage.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (3/14/2016)


    1) IIRC, partitioning is an Enterprise Edition only feature. That could preclude it's use right off the bat.

    2) Partitioning CAN make SOME or even MOST queries run very fast AND allow for high concurrency.

    It's not often that we disagree but I have to disagree with these two.

    Partitioned views are available in the Standard Edition and I actually prefer them over Partitioned Tables for many reasons.

    As for performance, I've never seen queries in partitioned tables beat a properly indexed monolithic table using a proper query. A lot of people think they get better performance with partitioning simply because their poorly written code no longer does a full table scan thanks to some lucky miracle known as partition elimination. Writing a proper query against a properly indexed monolithic table to do the same thing typically blows the doors off the partitioned version.

    That, not withstanding and all of that based only on personal experience, I also realize that my personal experience may be limited. Since you're in a consulting position and have seen a whole lot more than I have, can you give a slightly detailed example of where partitioning improved performance of queries when proper monolithic methods didn't fix the performance problems?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (3/14/2016)


    1) IIRC, partitioning is an Enterprise Edition only feature. That could preclude it's use right off the bat.

    2) Partitioning CAN make SOME or even MOST queries run very fast AND allow for high concurrency. There's a lot of caveats, provisos, limitations, gotchas, etc. there though.

    3) Having the same data on the same database with it's hardware, virtualization (possibly), settings, etc. will do almost nothing for performance other than your index/stats mx work.

    4) I would REALLY take a look at wait stats and IO stalls and hardware/VM/etc sizing/config and see if there aren't improvements available there. I have NEVER been to a client where even half of that stuff was optimal, and usually it is a very low percentage.

    Hi Kevin,

    1) We are using Enterprise Edition so that's not a problem

    2) could you provide some links where i could read about that

    3) There is a lot of aggregation made on this table, that's why i was thinking about indexed view but i didn't want to create one on the "production" version of this table so i thought maybe an archive.

    4) I spent quite a lot of time making sure that the configuration is as optimal as it can be based for the hardware that we have, so there is not much more it can be done here, we are aiming for a better server later this year.

  • On medium powered hardware, SQL Server is perfectly capable of running non-covering queries against TB sized tables with billions of rows within a reasonable amount of time (ex: less than one minute). It all has to do with how effectively the table is indexed.

    You mentioned there are 26 GB of indexes on 20 GB of data and index maintenance is becomming an issue. As a general rule, I get suspicious whenever I see index space exceed 50% of table space, and excessive I/O due to index updates can cause blocking issues, especially for long running aggregate style queries. So, I would start by verifying that each index is actually being used, which you can do using the referenced article below.

    https://www.mssqltips.com/sqlservertutorial/256/discovering-unused-indexes/

    Also, use Activity Monitor to determine if long running queries are actually being held for extended periods of time in a blocked state.

    Anothing thing to consider is incrementally building a summary table keyed on something like period. For example, after the close of each month, query totals for that specific month into the summary table. Unless transactions for prior periods are appended or updated, there should be no reason to re-aggregate across all periods every reporting cycle.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Jeff Moden (3/14/2016)


    TheSQLGuru (3/14/2016)


    1) IIRC, partitioning is an Enterprise Edition only feature. That could preclude it's use right off the bat.

    2) Partitioning CAN make SOME or even MOST queries run very fast AND allow for high concurrency.

    It's not often that we disagree but I have to disagree with these two.

    Partitioned views are available in the Standard Edition and I actually prefer them over Partitioned Tables for many reasons.

    As for performance, I've never seen queries in partitioned tables beat a properly indexed monolithic table using a proper query. A lot of people think they get better performance with partitioning simply because their poorly written code no longer does a full table scan thanks to some lucky miracle known as partition elimination. Writing a proper query against a properly indexed monolithic table to do the same thing typically blows the doors off the partitioned version.

    That, not withstanding and all of that based only on personal experience, I also realize that my personal experience may be limited. Since you're in a consulting position and have seen a whole lot more than I have, can you give a slightly detailed example of where partitioning improved performance of queries when proper monolithic methods didn't fix the performance problems?

    Sadly I am WAY behind the curve on, well, everything, so I can't take time to give you a more complete answer.

    A) I too love partitioned views and have used them at numerous clients. MOST SQL Server installations out there are NOT EE. Lots of people forget that.

    B) Simple answer on perf: multi-billion row table (taken to extreme - smaller row counts result in same thing just to smaller degree). Index however you want. Now assume you have 1M rows that are HOT, and you beat the sh-t out of them with index seeks and DML activity. How deep is your 2.5B row monolithic index? How deep is the 1M row hot-partition index? Every extra level is that much more IO/latch/lock/dirty page/tlog/index mx/etc/etc. This gets MUCH worse as your key sizes increase (think 5-GUID clustered PKs, numeric (18,0)s, etc, etc). The things I have seen would just make you cry Jeff. :w00t:

    There are partition elimination wins for both perf and especially concurrent activity (not swap-in-swap-out stuff, which is an obvious concurrency win).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (3/14/2016)


    Jeff Moden (3/14/2016)


    TheSQLGuru (3/14/2016)


    1) IIRC, partitioning is an Enterprise Edition only feature. That could preclude it's use right off the bat.

    2) Partitioning CAN make SOME or even MOST queries run very fast AND allow for high concurrency.

    It's not often that we disagree but I have to disagree with these two.

    Partitioned views are available in the Standard Edition and I actually prefer them over Partitioned Tables for many reasons.

    As for performance, I've never seen queries in partitioned tables beat a properly indexed monolithic table using a proper query. A lot of people think they get better performance with partitioning simply because their poorly written code no longer does a full table scan thanks to some lucky miracle known as partition elimination. Writing a proper query against a properly indexed monolithic table to do the same thing typically blows the doors off the partitioned version.

    That, not withstanding and all of that based only on personal experience, I also realize that my personal experience may be limited. Since you're in a consulting position and have seen a whole lot more than I have, can you give a slightly detailed example of where partitioning improved performance of queries when proper monolithic methods didn't fix the performance problems?

    Sadly I am WAY behind the curve on, well, everything, so I can't take time to give you a more complete answer.

    A) I too love partitioned views and have used them at numerous clients. MOST SQL Server installations out there are NOT EE. Lots of people forget that.

    B) Simple answer on perf: multi-billion row table (taken to extreme - smaller row counts result in same thing just to smaller degree). Index however you want. Now assume you have 1M rows that are HOT, and you beat the sh-t out of them with index seeks and DML activity. How deep is your 2.5B row monolithic index? How deep is the 1M row hot-partition index? Every extra level is that much more IO/latch/lock/dirty page/tlog/index mx/etc/etc. This gets MUCH worse as your key sizes increase (think 5-GUID clustered PKs, numeric (18,0)s, etc, etc). The things I have seen would just make you cry Jeff. :w00t:

    There are partition elimination wins for both perf and especially concurrent activity (not swap-in-swap-out stuff, which is an obvious concurrency win).

    Couple of months ago, someone else tried to make the same points on all that. Again, haven't had as much exposure to different things as you have so I can't actually go pro or con with you on the subject but I was able to beat the partitioned table performance with every ball he threw. The example table wasn't that big, though. It was only 8 million rows and (IIRC) something like 140 columns wide.

    The "big" table I did at work actually only has 4 columns but it's a 400GB table. It stores the binary for WAV files for "must keep" phone calls. I was going to go the route of just having the file paths in the database until I did my due diligence check and found that 10% of the files were missing and another 10% were corrupt. That's when I decided that no one protects data like a DBA and left the binaries in the table. Also got rid of the managed code that loaded all of that in favor of simple Bulk Inserted Blobs.

    On that table, I did Before'n'After performance testing with all the correct indexing during both tests. The partitioned table performance isn't bad, but it's not as quick as the monolithic table was. I partitioned only because I couldn't see backing up 74 months of data when 73 of those months never ever changed.

    Switching gears, really glad to hear someone else works with and likes Partitioned Views. Combined with a thoughtful "Instead of" triggers to get around some of the limitations like those with IDENTITY columns and I find they're much easier to manage (especially when trying to do a partial restore for dev) than Partitioned Tables. I also like the fact that I can get away with different indexing on the older stuff than the newer stuff and still have it all act as a single table.

    The things I have seen would just make you cry Jeff.

    I had to laugh out loud at that. πŸ˜› I know exactly what you mean. We've finally fixed all the code generated by a herd of hired cowboys initially led by a major hip-shooter over a 10 year period... The front end absolutely screams compared to what it used to. Now it's on to the batch code, which has had similar talent problems but 5 more years to fester. It's not that I want to cry... it's that I have to. :blink::pinch::crying: Unbelievable slaughter of nearly all that's sacred in a database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sounds like we both have quite a lot of job security, eh Jeff?!? πŸ˜€

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Emil B (3/14/2016)


    ... the table has about 90mlns of rows with 20 GB data space and 26 GB index space ...

    The cleverer peepels here will probably shoot me down on this. BUT, perhaps part of the problem is in the indexing strategy. When the indexes take up more space than the table, it could be that there are overlapping indexes (or lots of includes) that could be merged. With the correct clustered index, most, if not all, includes can be eliminated, thereby reducing the size and number of required indexes.

  • DesNorton (3/15/2016)


    Emil B (3/14/2016)


    ... the table has about 90mlns of rows with 20 GB data space and 26 GB index space ...

    The cleverer peepels here will probably shoot me down on this. BUT, perhaps part of the problem is in the indexing strategy. When the indexes take up more space than the table, it could be that there are overlapping indexes (or lots of includes) that could be merged. With the correct clustered index, most, if not all, includes can be eliminated, thereby reducing the size and number of required indexes.

    the thing is I already done all of than, removed duplicates, merged overlapping indexes etc. there is one index with a lot of includes which is almost the size of the table, but based on the usage stats is the most valuable one.

  • Emil B (3/15/2016)


    DesNorton (3/15/2016)


    Emil B (3/14/2016)


    ... the table has about 90mlns of rows with 20 GB data space and 26 GB index space ...

    The cleverer peepels here will probably shoot me down on this. BUT, perhaps part of the problem is in the indexing strategy. When the indexes take up more space than the table, it could be that there are overlapping indexes (or lots of includes) that could be merged. With the correct clustered index, most, if not all, includes can be eliminated, thereby reducing the size and number of required indexes.

    the thing is I already done all of than, removed duplicates, merged overlapping indexes etc. there is one index with a lot of includes which is almost the size of the table, but based on the usage stats is the most valuable one.

    Of COURSE that index is the "most valuable one"!! πŸ˜€ That is what DTA (making a guess based on extensive experience with clients here) does for you. Order a column(s) for seeks and then INCLUDE everything else needed by a query. And if you/DTA leaves off a few large columns or enough small ones this index can become the most efficient access for many/most scans too.

    What you REALLY need here is an indexing strategy project. And believe me it is just that. You cannot tune individual queries for months or years and/or use index usage stats alone and have anything approaching an optimal table. You must evaluate the data, data value skew, data access and DML patterns, etc. on the whole to devise an optimal indexing arrangement. Don't forget ALL of the optional INDEX parameters too - especially the most important one - fill factor!! This will be an iterative process, with some fits and starts. You will need to be able to test AT PRODUCTION SCALE to get it right too (or as usually happens, test in production but monitor closely for badness).

    My record for such a project is almost 200 man hours optimizing indexing (primarily on "the table" - 67 nonclustered indexes that averaged over 50% of all columns on that sadly-fat table!!! :woot:) at one client. I took a system that was made almost non-functional with rampant DTA use by eveyone and when I was done it SCREAMED from both a DML AND SELECT perspective.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • The main problem is that on our fast drive we are slowly running out of space, we have another, slower, drive that we could use. We are planning to move to a new server but I'm afraid that with the current growth of that table we might run out of space before that happens, just a year ago the company was really small and no one expected such growth, I joined about 6 months ago and still trying to tidy everything up (as the database was "designed" by software devs), but with current server I'm limited to two drives and our provider said this is the maximum we can get. I know this is driving a bit away from the initial post but generally I needed to know how other DBA's deal with "large", fast growing tables.

  • Emil B (3/16/2016)


    The main problem is that on our fast drive we are slowly running out of space, we have another, slower, drive that we could use. We are planning to move to a new server but I'm afraid that with the current growth of that table we might run out of space before that happens, just a year ago the company was really small and no one expected such growth, I joined about 6 months ago and still trying to tidy everything up (as the database was "designed" by software devs), but with current server I'm limited to two drives and our provider said this is the maximum we can get. I know this is driving a bit away from the initial post but generally I needed to know how other DBA's deal with "large", fast growing tables.

    Just as a matter of interest Emil can you describe how your data, log and tempdb files are laid out with regard to the "drives" you have available to you on this server?

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply