SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Need Recommendations for Document Database


Need Recommendations for Document Database

Author
Message
aureolin
aureolin
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2109 Visits: 1076
We're in the process of a data migration from one LOB tool to another over this weekend. Part of the migration is moving just under a terabyte's worth of documents from files on disk to blobs in a database (SQL 2014). The destination system is a locally hosted VM with enough disk space on one data "disk drive", but I'm looking for recommendations on how to architect the destination database. Considerations are ease of management, ease of taking backups, etc. FWIW, the individual files are generally small, many are one to two page .PDF files. Yes, I do know this is last minute, but this is my only chance to "get it right" from the start.

Suggestions, comments, advice all welcome.

Steve G.
County of San Luis Obispo



BU69
BU69
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1488 Visits: 424
For backups I had similar, also things like CHECKDB, read the thread I posted, may have some answers for maintenance and backup: https://www.sqlservercentral.com/Forums/1921328/Extremely-Large-Backup-Solutions

Basically striped backups across multiple disks are faster, I got the time down to less than half, and any VM performance settings you can change, restoring from striped is slightly more tricky but I worked out a routine which queries msdb to get the last backup paths to a restore script. I used the Ola Hallegren script for backups and you just add multiple paths in, see here where I got it wrong: https://www.sqlservercentral.com/Forums/1922540/Ola-Hallengren-Striped-Backup-Job

Did you come up with any solutions for the actual database itself?

Forgot to mention above that my DBs are also document stores, so compression on the backups I was only getting down to about 70% of the original data.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)

Group: General Forum Members
Points: 541051 Visits: 44597
aureolin - Friday, February 9, 2018 10:37 AM
We're in the process of a data migration from one LOB tool to another over this weekend. Part of the migration is moving just under a terabyte's worth of documents from files on disk to blobs in a database (SQL 2014). The destination system is a locally hosted VM with enough disk space on one data "disk drive", but I'm looking for recommendations on how to architect the destination database. Considerations are ease of management, ease of taking backups, etc. FWIW, the individual files are generally small, many are one to two page .PDF files. Yes, I do know this is last minute, but this is my only chance to "get it right" from the start.

Suggestions, comments, advice all welcome.

Steve G.
County of San Luis Obispo


Once added, do the documents change? New ones might get added but I'm thinking that existing documents are going to change. Backing up more than once, stuff that won't ever change, is a bit insane. All of this can be handled fairly gracefully where daily backups that measure in minutes (I have similar in a Terabyte telephone system where all call recordings are stored in the database) and a "get back in business DR restore" can be accomplished in a similar time.

Heh... and forget about striped backups... unless you have the right pipe and the right disk configuration, they'll actually slow things down and complicate restores. There's a much better way if the documents become static once loaded into the system.

Before we can continue, though, which version and edition of SQL Server do you have?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
BU69
BU69
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1488 Visits: 424
Jeff Moden - Wednesday, February 14, 2018 6:43 AM
aureolin - Friday, February 9, 2018 10:37 AM
We're in the process of a data migration from one LOB tool to another over this weekend. Part of the migration is moving just under a terabyte's worth of documents from files on disk to blobs in a database (SQL 2014). The destination system is a locally hosted VM with enough disk space on one data "disk drive", but I'm looking for recommendations on how to architect the destination database. Considerations are ease of management, ease of taking backups, etc. FWIW, the individual files are generally small, many are one to two page .PDF files. Yes, I do know this is last minute, but this is my only chance to "get it right" from the start.

Suggestions, comments, advice all welcome.

Steve G.
County of San Luis Obispo


Once added, do the documents change? New ones might get added but I'm thinking that existing documents are going to change. Backing up more than once, stuff that won't ever change, is a bit insane. All of this can be handled fairly gracefully where daily backups that measure in minutes (I have similar in a Terabyte telephone system where all call recordings are stored in the database) and a "get back in business DR restore" can be accomplished in a similar time.

Heh... and forget about striped backups... unless you have the right pipe and the right disk configuration, they'll actually slow things down and complicate restores. There's a much better way if the documents become static once loaded into the system.

Before we can continue, though, which version and edition of SQL Server do you have?

It's not always possible to archive or set documents as read-only for some applications so you need to be more creative with the way you back things up. In previous systems I had the luxury of being able to get the vendor to create read-only databases we could backup once and keep but for other systems, the current one I have is in Sharepoint, the documents can be changed after they are added, so read-only is not possible.

Not sure why you dismiss striped backups, they have speeded up the backups I have in place, can you elaborate and provide examples on ways they slow the backup process down so I can keep this in mind for the future? They do complicate the restores though but htis can be mitigated with a good script.

Eric M Russell
Eric M Russell
SSC Guru
SSC Guru (66K reputation)SSC Guru (66K reputation)SSC Guru (66K reputation)SSC Guru (66K reputation)SSC Guru (66K reputation)SSC Guru (66K reputation)SSC Guru (66K reputation)SSC Guru (66K reputation)

Group: General Forum Members
Points: 66158 Visits: 13024
aureolin - Friday, February 9, 2018 10:37 AM
We're in the process of a data migration from one LOB tool to another over this weekend. Part of the migration is moving just under a terabyte's worth of documents from files on disk to blobs in a database (SQL 2014). The destination system is a locally hosted VM with enough disk space on one data "disk drive", but I'm looking for recommendations on how to architect the destination database. Considerations are ease of management, ease of taking backups, etc. FWIW, the individual files are generally small, many are one to two page .PDF files. Yes, I do know this is last minute, but this is my only chance to "get it right" from the start.

Suggestions, comments, advice all welcome.

Steve G.
County of San Luis Obispo

Why is it considered important to migrate the documents to a database?
You're moving the documents from cheaper storage that is already optimized for files to more expensive database optimized storage, and all those BLOB pages will waste memory, data space, and transaction log space that could be put to better use for tables and indexing.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
aureolin
aureolin
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2109 Visits: 1076
Jeff Moden - Wednesday, February 14, 2018 6:43 AM
aureolin - Friday, February 9, 2018 10:37 AM
We're in the process of a data migration from one LOB tool to another over this weekend. Part of the migration is moving just under a terabyte's worth of documents from files on disk to blobs in a database (SQL 2014). The destination system is a locally hosted VM with enough disk space on one data "disk drive", but I'm looking for recommendations on how to architect the destination database. Considerations are ease of management, ease of taking backups, etc. FWIW, the individual files are generally small, many are one to two page .PDF files. Yes, I do know this is last minute, but this is my only chance to "get it right" from the start.

Suggestions, comments, advice all welcome.

Steve G.
County of San Luis Obispo


Once added, do the documents change? New ones might get added but I'm thinking that existing documents are going to change. Backing up more than once, stuff that won't ever change, is a bit insane. All of this can be handled fairly gracefully where daily backups that measure in minutes (I have similar in a Terabyte telephone system where all call recordings are stored in the database) and a "get back in business DR restore" can be accomplished in a similar time.

Heh... and forget about striped backups... unless you have the right pipe and the right disk configuration, they'll actually slow things down and complicate restores. There's a much better way if the documents become static once loaded into the system.

Before we can continue, though, which version and edition of SQL Server do you have?

I have SQL 2014 Enterprise running the database. Generally, there's a large set of the documents that are not going to change; going forward there may be a subset of the documents that will change - though this remains to be seen.




aureolin
aureolin
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2109 Visits: 1076
Eric M Russell - Wednesday, February 14, 2018 7:17 AM
aureolin - Friday, February 9, 2018 10:37 AM
We're in the process of a data migration from one LOB tool to another over this weekend. Part of the migration is moving just under a terabyte's worth of documents from files on disk to blobs in a database (SQL 2014). The destination system is a locally hosted VM with enough disk space on one data "disk drive", but I'm looking for recommendations on how to architect the destination database. Considerations are ease of management, ease of taking backups, etc. FWIW, the individual files are generally small, many are one to two page .PDF files. Yes, I do know this is last minute, but this is my only chance to "get it right" from the start.

Suggestions, comments, advice all welcome.

Steve G.
County of San Luis Obispo

Why is it considered important to migrate the documents to a database?
You're moving the documents from cheaper storage that is already optimized for files to more expensive database optimized storage, and all those BLOB pages will waste memory, data space, and transaction log space that could be put to better use for tables and indexing.

Heh, This is the way our new LOB application works. I'm just going with the program. ;-)




Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)SSC Guru (541K reputation)

Group: General Forum Members
Points: 541051 Visits: 44597
aureolin - Wednesday, February 14, 2018 10:04 AM
Jeff Moden - Wednesday, February 14, 2018 6:43 AM
aureolin - Friday, February 9, 2018 10:37 AM
We're in the process of a data migration from one LOB tool to another over this weekend. Part of the migration is moving just under a terabyte's worth of documents from files on disk to blobs in a database (SQL 2014). The destination system is a locally hosted VM with enough disk space on one data "disk drive", but I'm looking for recommendations on how to architect the destination database. Considerations are ease of management, ease of taking backups, etc. FWIW, the individual files are generally small, many are one to two page .PDF files. Yes, I do know this is last minute, but this is my only chance to "get it right" from the start.

Suggestions, comments, advice all welcome.

Steve G.
County of San Luis Obispo


Once added, do the documents change? New ones might get added but I'm thinking that existing documents are going to change. Backing up more than once, stuff that won't ever change, is a bit insane. All of this can be handled fairly gracefully where daily backups that measure in minutes (I have similar in a Terabyte telephone system where all call recordings are stored in the database) and a "get back in business DR restore" can be accomplished in a similar time.

Heh... and forget about striped backups... unless you have the right pipe and the right disk configuration, they'll actually slow things down and complicate restores. There's a much better way if the documents become static once loaded into the system.

Before we can continue, though, which version and edition of SQL Server do you have?

I have SQL 2014 Enterprise running the database. Generally, there's a large set of the documents that are not going to change; going forward there may be a subset of the documents that will change - though this remains to be seen.


You definitely have the right tool to do this, then. You could do a Partitioned Table with one filegroup per month and one file per filegroup. As a month becomes "static" (no more changes/updates/additions), you can do a trick to squash all unused space out of the filegroup/file and set it (the filegroup/partition) to READ_ONLY and then back it up. What that will do for you is keep from ever having to back it up again. You only need to backup the current month and the next month (which is always empty until it becomes the current month and you build a new "next month" partition).

This also allows for online "Piece-Meal" restores (if a filegroup or file ever goes bad) and a very quick "get back in business" restore of the current data and then you can restore the outlying months is a less hectic manner.

I will say, though, that if I had it to do over, I'd use a Partitioned View, instead. It allows you to restore any "partition" at any time and also allows you to not restore them all if you're restoring to a Development box (for example). The other advantage of Partitioned Views (especially if you use one database per month {and that's not so bad as most would make it out to be}) is that the table in each of those databases can be made to have different indexes to optimize the "selectability" as compared to the current month which also needs optimization for inserts and updates.

I do have to agree with the others, though. Storing large binaries in a database is pretty tough on memory, etc. You might want to reconsider and use something like "file stream" or other available on disk/off database method. Of course, the advantage to having them in the database is that no one protects data better than a good DBA.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
BU69
BU69
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1488 Visits: 424
Jeff, can you let me know what issues you have had with the striped backups, I would genually like to know so I can avoid any issues going forward if I'm about to set our Sharepoint estate to be backed up this way? I've not been able to find anything on them slowing down backups, the only drawback so far is the restores be a bit of a pain.

For the DMS the company is implementing they are storing everything in Sharepoint (decided before my time here), so now it's more a case of trying to fix issues after the fact, we've requested from the vendor a facility to mark documents as complete so we can archive but that might be a long time in coming, at the moment users are free to change any document so our problem still remains having to backup Tbs of database on a regular basis. The only saving grace is that with Sharepoint you can set the content databases to be a maximum of 200Gb each so if you have an issue with only a few documents you can restore only that content database.

What I have found so far is striping the backups has more than halved the backup time, for CHECKDB I will need to restore each database to a separate server and kill two birds with one stone, restore test and checkdb, I can't find any alternative with the consistency check as it just take such a long time I can't let this happen on a production server. I presume that will be another issue for Steve G as well.
andrew gothard
andrew gothard
SSCertifiable
SSCertifiable (6.6K reputation)SSCertifiable (6.6K reputation)SSCertifiable (6.6K reputation)SSCertifiable (6.6K reputation)SSCertifiable (6.6K reputation)SSCertifiable (6.6K reputation)SSCertifiable (6.6K reputation)SSCertifiable (6.6K reputation)

Group: General Forum Members
Points: 6555 Visits: 6263
Eric M Russell - Wednesday, February 14, 2018 7:17 AM
aureolin - Friday, February 9, 2018 10:37 AM
We're in the process of a data migration from one LOB tool to another over this weekend. Part of the migration is moving just under a terabyte's worth of documents from files on disk to blobs in a database (SQL 2014). The destination system is a locally hosted VM with enough disk space on one data "disk drive", but I'm looking for recommendations on how to architect the destination database. Considerations are ease of management, ease of taking backups, etc. FWIW, the individual files are generally small, many are one to two page .PDF files. Yes, I do know this is last minute, but this is my only chance to "get it right" from the start.

Suggestions, comments, advice all welcome.

Steve G.
County of San Luis Obispo

Why is it considered important to migrate the documents to a database?
You're moving the documents from cheaper storage that is already optimized for files to more expensive database optimized storage, and all those BLOB pages will waste memory, data space, and transaction log space that could be put to better use for tables and indexing.

Backups. Security. Audit.

To expand slightly on this,
Backups.
we had a similar situation where we had x million critical documents held on a filestore. Backups took, due to the sheer number of documents, 4+ days. Recovery Point of this type of document, meaning a potential loss of a week's worth of data, certainly caught the attention of the board when this was explained. Log backups every 5 minutes made them rather happier.

Security.
adding effective security to a filestore of this size and complexity is rather more challenging than controlling access to a database storage system. If your document store contains any personally identifiable or commercially sensitive information you absolutely have to make sure access to this information is effectively controlled. No excuses.

Audit.
as above. If all this stuff is dumped on a disk, where's your audit trail? Who changed what, and when, and why? Who searched documents (and this is clearly easier in a properly structured database than a data jumble sale), for people, by what criteria. Who viewed data?

In the new system the 25 million + documents are fully secured. All activity; creation, position in document lifecycle, reads, updates are logged. Documents are immutable - if you make a change, even changing just the date from the 12th to the 11th, both versions, with history are stored - no idea how you'd do this on a shared drive and effectively control it auditably - which is actually critical in many situations and legally required in a hell of a lot. Partitioning of the documents by date also makes management easier, particularly as the database grows. Hitting the limit on your partition on a single SAN location could prove a maintenance nightmare in terms of storage and potential app changes if not very, very carefully planned up front. I can put my partitions wherever I want as long as I tell SQL Server. The app is obviously totally blind to that.


I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search