5TB Database and Only a Primary Filegroup

  • Morning Guys,

    I've inherited a 5TB database. 3TB of this DB are BLOB data stored in a Document table. (Not FileStream or FileTable, just image datatype). There is only a single primary filegroup.

    The time is coming where I need to do some massive maintenance on this DB anyway (in particular collation needs changing for the 500 or so tables) my question is whilst I am doing this (which is mostly involving creating a NEW database side by side and Bcp'ing the data back in (as some data is going in selectively as part of an archiving process that overlaps) so at this point is it worth re-organising the layout of the tables and where they sit by adding filegroups and moving tables to their own filegroup.

    The DB is hosted on a SAN.

    Is there any performance improvement I can expect from chopping this DB up into more discrete filegroups. The db spans accross 15 files but looking at the system tables it seems most tables span every file. I want to think this is bad, but I can't think why from a performance perspective whilst on a SAN.

    I want to seperate documents into their file group for backup purposes...  We are also considering using Stretch Database/Tables in future for some older data.

    As always, your thoughts and suggestions are appreciated.

    Cheers
    Alex

  • alex.sqldba - Monday, July 16, 2018 4:44 AM

    I want to seperate documents into their file group for backup purposes...  We are also considering using Stretch Database/Tables in future for some older data.

    That's actually a very good "First Step" (and, no... not talking about the "Stretch" thing although that can be an option with other planning).  I was faced with a similar problem except we were storing compressed WAV files (call recordings) in our SQL Server based telephone system (long story but it turned out to be MUCH better than storing them out-of-database on the file system).  I currently have about 1.4TB in that single table.

    As you can imagine, backups for the system were taking a huge amount of time and it was a totally stupid thing to do anyway because that data never changes.  I started up a new database, partitioned the table by month with 1 File Group per month and 1 File per File Group and that allowed me to set the older File Groups to Read_Only and do a single final backup of each month as time wore on.  It also allows for "Piece Meal" restores and can quickly do enough of a restore to "get back in business" if that problem {knocking on a very strong piece of wood as I say that} should ever rear it's ugly head.  It allows for the nearly casual restore of the other file groups while the main system is up an running with the most recent data.

    I will admit that I drank the MS Koolaid (tm) and used "Partitioned Tables" for that.  The problem is that you eventually have to restore everything and that doesn't work so well on smaller development systems where you only need 3 months of data/recordings (for example).  I AM considering rebuilding the system to use partitioned views with 1 database per month.  Yeah... that will currently require 96 separate databases in my case (we're required to keep all that forever) or some consolidation by year requiring "only" 1 database per legacy year and 1 month each for the current year but no problem for me considering the benefits it will bring (different, more effective indexing for old data than the new, ability to easily grab just 3 months during restores to the dev system, no need for "aligned" indexes, etc, etc, and still have the ability to "switch" in or out if necessary although it's not called "switch" in Partitioned Views).  The old data is referenced from the original database by a synonym making the change totally transparent to the apps that use the data.

    Also and in most cases, the largest tables are usually some form of "Audit" tables or the old "Invoice Detail" types of data where the older data will also never change again.  Consider a similar Read_Only partition scheme (not to mention PAGE compression)  for that older data, as well.

    As you go along, also remember that you may, indeed, be able to delete a lot of the older data in the future and your plans should include that fact so you don't have to go nuts when that eventuality occurs (you know that but I had to say it out loud 😉 )  One of the more PITA problems is reducing BLOB data.  LOB_Compaction DOES NOT WORK for out of row LOB's (I can't believe MS did that to us).  There is a way to compact it but it takes some science to do it.  I'm considering writing an article about it because it seems that a whole lot of people aren't aware of that fact.  I know I wasn't until I actually needed to do such a thing and I've and could only find obscure reference to the problem in a single, old, rejected Connect Item.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff for this excellent reply.
    I am certainly interested in your LOB_COMPACTION article as that problem is most certainly going to rear its head soon. We already have 75GB of free space from deleting 'fake' documents (there was a bug in some code that meant a document was inserted twice, only with an orphaned user id, which made it simple to identify but was in production for 2 years before anyone questioned why we had precisely double the number of rows we were expecting. With GDPR forcing us to delete a lot more this free space will go rapidly!

  • Thanks for the feedback.  I also meant to say that, contrary to what many people think/claim, partitioning usually results in somewhat slower query performance than when a good query plays against a properly built and indexed monolithic structure.  It was a bit of a surprise to me several years back when I partitioned the Call Recordings table that I mentioned before.  Even if the partitioned table is structured perfectly and the query is written perfectly against it so that it can take full advantage, the steps people take to implement partitioning are the very same steps (sans the actual partitioning) that they need to make their queries run faster against monolithic structures.  Because such monoliths only have one B-Tree to examine, they usually run faster even in the presence of supposed partition elimination in the partitioned tables.  YMMV but that's what I've experienced in the things I've had to do with partitioning, so far.  Partitioning is still VERY WORTH it from the point of backups, restores, and other maintenance, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply