• Eric M Russell - Wednesday, February 14, 2018 7:17 AM

    aureolin - Friday, February 9, 2018 10:37 AM

    We're in the process of a data migration from one LOB tool to another over this weekend. Part of the migration is moving just under a terabyte's worth of documents from files on disk to blobs in a database (SQL 2014). The destination system is a locally hosted VM with enough disk space on one data "disk drive", but I'm looking for recommendations on how to architect the destination database. Considerations are ease of management, ease of taking backups, etc. FWIW, the individual files are generally small, many are one to two page .PDF files. Yes, I do know this is last minute, but this is my only chance to "get it right" from the start.

    Suggestions, comments, advice all welcome.

    Steve G.
    County of San Luis Obispo

    Why is it considered important to migrate the documents to a database?
    You're moving the documents from cheaper storage that is already optimized for files to more expensive database optimized storage, and all those BLOB pages will waste memory, data space, and transaction log space that could be put to better use for tables and indexing.

    Backups.  Security.  Audit.

    To expand slightly on this,
    Backups.
        we had a similar situation where we had x million critical documents held on a filestore.  Backups took, due to the sheer number of documents, 4+ days.  Recovery Point of this type of document, meaning a potential loss of a week's worth of data, certainly caught the attention of the board when this was explained.  Log backups every 5 minutes made them rather happier.

    Security.
        adding effective security to a filestore of this size and complexity is rather more challenging than controlling access to a database storage system.  If your document store contains any personally identifiable or commercially sensitive information you absolutely have to make sure access to this information is effectively controlled.  No excuses.

    Audit.
        as above.  If all this stuff is dumped on a disk, where's your audit trail?  Who changed what, and when, and why?  Who searched documents (and this is clearly easier in a properly structured database than a data jumble sale), for people, by what criteria.  Who viewed data? 

    In the new system the 25 million + documents are fully secured.  All activity; creation, position in document lifecycle, reads, updates are logged.  Documents are immutable - if you make a change, even changing just the date from the 12th to the 11th, both versions, with history are stored - no idea how you'd do this on a shared drive and effectively control it auditably - which is actually critical in many situations and legally required in a hell of a lot.  Partitioning of the documents by date also makes management easier, particularly as the database grows.  Hitting the limit on your partition on a single SAN location could prove a maintenance nightmare in terms of storage and potential app changes if not very, very carefully planned up front.  I can put my partitions wherever I want as long as I tell SQL Server.  The app is obviously totally blind to that.

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.