RE: Need Recommendations for Document Database

SSC Guru

Points: 1003894

February 14, 2018 at 5:19 pm

#1979858

aureolin - Wednesday, February 14, 2018 10:04 AM
Jeff Moden - Wednesday, February 14, 2018 6:43 AM
aureolin - Friday, February 9, 2018 10:37 AM
We're in the process of a data migration from one LOB tool to another over this weekend. Part of the migration is moving just under a terabyte's worth of documents from files on disk to blobs in a database (SQL 2014). The destination system is a locally hosted VM with enough disk space on one data "disk drive", but I'm looking for recommendations on how to architect the destination database. Considerations are ease of management, ease of taking backups, etc. FWIW, the individual files are generally small, many are one to two page .PDF files. Yes, I do know this is last minute, but this is my only chance to "get it right" from the start.
Suggestions, comments, advice all welcome.
Steve G.
County of San Luis Obispo
Once added, do the documents change? New ones might get added but I'm thinking that existing documents are going to change. Backing up more than once, stuff that won't ever change, is a bit insane. All of this can be handled fairly gracefully where daily backups that measure in minutes (I have similar in a Terabyte telephone system where all call recordings are stored in the database) and a "get back in business DR restore" can be accomplished in a similar time.
Heh... and forget about striped backups... unless you have the right pipe and the right disk configuration, they'll actually slow things down and complicate restores. There's a much better way if the documents become static once loaded into the system.
Before we can continue, though, which version and edition of SQL Server do you have?
I have SQL 2014 Enterprise running the database. Generally, there's a large set of the documents that are not going to change; going forward there may be a subset of the documents that will change - though this remains to be seen.

You definitely have the right tool to do this, then. You could do a Partitioned Table with one filegroup per month and one file per filegroup. As a month becomes "static" (no more changes/updates/additions), you can do a trick to squash all unused space out of the filegroup/file and set it (the filegroup/partition) to READ_ONLY and then back it up. What that will do for you is keep from ever having to back it up again. You only need to backup the current month and the next month (which is always empty until it becomes the current month and you build a new "next month" partition).

This also allows for online "Piece-Meal" restores (if a filegroup or file ever goes bad) and a very quick "get back in business" restore of the current data and then you can restore the outlying months is a less hectic manner.

I will say, though, that if I had it to do over, I'd use a Partitioned View, instead. It allows you to restore any "partition" at any time and also allows you to not restore them all if you're restoring to a Development box (for example). The other advantage of Partitioned Views (especially if you use one database per month {and that's not so bad as most would make it out to be}) is that the table in each of those databases can be made to have different indexes to optimize the "selectability" as compared to the current month which also needs optimization for inserts and updates.

I do have to agree with the others, though. Storing large binaries in a database is pretty tough on memory, etc. You might want to reconsider and use something like "file stream" or other available on disk/off database method. Of course, the advantage to having them in the database is that no one protects data better than a good DBA.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)