storing Image in Database (SQL Server ) vs File System

  • Dear All
    In My scenario we are handled more then 1 TB size images.Last couple of  years we are storing all the images in SQLSERver database (Image Datatype). But now my superior told me here after we will store the image in File system . I have  so many confusion this idea .Please suggest me Pros and cons of both the system. Any one have practical experience store the image in File system how is manage backup and transfer one  location to another.please help me.
    Thank You

  • I have no experience in storing images in the "File System".

    I do, however, have a huge amount of experience in storing "images" (WAV files, in my case) in SQL Server.  Since the IMAGE datatype was deprecated a very long time ago, I use VARBINARY(MAX) instead.

    I decided to store the WAV files in the database more than 7 years ago.  More correctly, I decided to continue to store the images in the database back then.  The reason was simple... as others frequently are, I was, at first (shortly after start of my employment at the company) aghast at the idea that someone had stored the WAV files in the database, especially with as large as each one was (they're a type of compressed WAV file but they still get pretty big).

    They had also stored the file path for each WAV file, which made even less sense as to why they decided to store the WAV files in the database.

    As part of my research, I told the system to verify that there actually was a file present for each path and that it was viable.  That's when I changed my mind about what they had done because about 15% of the files were missing and 10% were corrupt and couldn't be used.  The files stored in the database as VARBINARY() were all present and they were all viable with no corruption.

    Since we have to keep the files "forever", I partitioned the table by month (being careful to not trap any free-space in the older months), set the older months' file groups to READ-ONLY, backed them up one final time, and currently only have to backup the current month for a huge time and disk savings.  The system has been operating for the last 6 years (I partitioned the table a year after I started) with no additional actions required.  Every month when the calendar rolls over, the old month is automatically rebuilt to remove all free space, is set to READ_ONLY, and a final backup of that old month is taken.  It's been totally "auto-magic", trouble free, and efficient since I first partitioned it.

    And, yeah... I do test full blown restores on a regular basis.

    The database currently contains 112 monthly partitions including the "current month" and empty "next month" partitions.  In whole numbers the table is 1.2 Terabytes in size consisting of 3.7 Million WAV files and the specs on the last 100,000 calls (WAV files) are 361 KB average file size with a max of 7.9 MB.

    With all the missing files and the corruption on disk of the source files, I have to say that no one will take care of data better than a DBA and an SQL Server database.  My advice to others is that if you're drawn into the wiles of new things like File Stream, etc, make damned sure that the files are easily backed up and restored and that no one can accidentally delete or move the files without actually updating their location in SQL Server.  Then, ask them how long it would take them to restore a month's worth of calls.  I can do it in 6 minutes with a single command with a guarantee of no data loss.  Can others make the same claim?  If so, go for it but I'll stick with storing them in the database.

    Of course, that's just my humble opinion and recommendation. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s.  If you're "stuck" with an older Standard Edition of SQL Server, the partitioning can still be done using Partitioned Views and, IMHO, would be even more flexible overall than Partitioned Tables.  In fact, the only reason why I used Partitioned Tables was because that was the big buzz word that everyone was using back then.  For a ton of reasons (and I can provide some detail if you need me to), I'm horribly sorry that I even entertained the idea of Partitioned Tables and have it on my docket to convert the system to Partitioned Views even though I do have the Enterprise Edition on that older box.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is what Filestream was created to do. If the images are > 1 MB, you get better performance with them as files in the file system rather than in the db. If they're under 1MB on average, I'd just stick them in the database, as Jeff suggested.

    The downside to putting these in the db is that you raise RTO/RPO if you do a restore and backups time/size increases. The upside is they're always there. Filestream also probably means some application changes to make this work.

    The filesystem can work fine, but you do run into sync issues. I've worked on a few systems and never seen 100% match between the filesystem and the URL/URI in the database. Even with jobs that run to catch this weekly, we find drift occurring. You don't have a good way to scan the file system and verify integrity, which you do in the database. I recommend against this, but lots of people like it to keep the db smaller.

    If you're concerned about db size, I'd make a 2nd db, just put images in there, and have a synonym from your other db to access them.

  • Jeff Moden - Sunday, January 27, 2019 9:55 AM

    I have no experience in storing images in the "File System".

    I do, however, have a huge amount of experience in storing "images" (WAV files, in my case) in SQL Server.  Since the IMAGE datatype was deprecated a very long time ago, I use VARBINARY(MAX) instead.

    I decided to store the WAV files in the database more than 7 years ago.  More correctly, I decided to continue to store the images in the database back then.  The reason was simple... as others frequently are, I was, at first (shortly after start of my employment at the company) aghast at the idea that someone had stored the WAV files in the database, especially with as large as each one was (they're a type of compressed WAV file but they still get pretty big).

    They had also stored the file path for each WAV file, which made even less sense as to why they decided to store the WAV files in the database.

    As part of my research, I told the system to verify that there actually was a file present for each path and that it was viable.  That's when I changed my mind about what they had done because about 15% of the files were missing and 10% were corrupt and couldn't be used.  The files stored in the database as VARBINARY() were all present and they were all viable with no corruption.

    Since we have to keep the files "forever", I partitioned the table by month (being careful to not trap any free-space in the older months), set the older months' file groups to READ-ONLY, backed them up one final time, and currently only have to backup the current month for a huge time and disk savings.  The system has been operating for the last 6 years (I partitioned the table a year after I started) with no additional actions required.  Every month when the calendar rolls over, the old month is automatically rebuilt to remove all free space, is set to READ_ONLY, and a final backup of that old month is taken.  It's been totally "auto-magic", trouble free, and efficient since I first partitioned it.

    And, yeah... I do test full blown restores on a regular basis.

    The database currently contains 112 monthly partitions including the "current month" and empty "next month" partitions.  In whole numbers the table is 1.2 Terabytes in size consisting of 3.7 Million WAV files and the specs on the last 100,000 calls (WAV files) are 361 KB average file size with a max of 7.9 MB.

    With all the missing files and the corruption on disk of the source files, I have to say that no one will take care of data better than a DBA and an SQL Server database.  My advice to others is that if you're drawn into the wiles of new things like File Stream, etc, make damned sure that the files are easily backed up and restored and that no one can accidentally delete or move the files without actually updating their location in SQL Server.  Then, ask them how long it would take them to restore a month's worth of calls.  I can do it in 6 minutes with a single command with a guarantee of no data loss.  Can others make the same claim?  If so, go for it but I'll stick with storing them in the database.

    Of course, that's just my humble opinion and recommendation. 😉

    Thanks for this detailed post; it could affect how we design something later this year.
    As a matter of interest, what is the tempdb write latency like when CHECKDB (CHECKTABLE/CHECKFILEGROUP) is run against the table(s) containing the images?

  • Ken McKelvey - Monday, January 28, 2019 10:16 AM

    Thanks for this detailed post; it could affect how we design something later this year.
    As a matter of interest, what is the tempdb write latency like when CHECKDB (CHECKTABLE/CHECKFILEGROUP) is run against the table(s) containing the images?

    I have no idea.  DBCC CHECKDB beats the hell out of TempDB and I've taken to doing other things (like doing restores and doing other checks) rather than using DBCC CHECKDB.  Since MS saw fit to enforce TF 1118 on TempDB, it's gotten even worse for large databases with very tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • An example of the above... I ran DBCC CHECKDB WITH ESTIMATEONLY on one of my databases that has no "images" stored.  The MDF file is only 900GB and there's 150GB of free-space in that so the data/indexes are only 750GB in total and the largest table/index is only 146GB.  Yet, ... DBCC CHECKDB says it's going to use 242GB of TempDB.  That's just nuts.  Imagine what would happen if I did that on a big database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, January 28, 2019 12:16 PM

    An example of the above... I ran DBCC CHECKDB WITH ESTIMATEONLY on one of my databases that has no "images" stored.  The MDF file is only 900GB and there's 150GB of free-space in that so the data/indexes are only 750GB in total and the largest table/index is only 146GB.  Yet, ... DBCC CHECKDB says it's going to use 242GB of TempDB.  That's just nuts.  Imagine what would happen if I did that on a big database.

    Thanks for the information and your time Jeff.
    I know the load can be kept off the production server by restoring onto the test/developement environment but in this case CHECKDB is still run.
    What are the alternatives to using CHECKDB? I have not noticed anything which is commercially available.

  • To be absolutely honest and, hopefully, a bit more clear (I DO use DBCC CHECKDB but not in production and I don't time it) , nothing is a full up substitute for DBCC CHECKDB.  The best way to avoid the impact that it has on production systems is to have a side system that you can do restores to and do DBCC CHECKDB there.  That has the added advantage of letting you know that your "Restore Plan" (which includes the backups) actually does work.  But that can take quite a bit of time and I'd like to know of "possibilities" long before such a thing can take place.

    One of the things I do on a regularly scheduled basis is to check on the content of a key table that acts a bit like the proverbial canary in the coal mine.  To simplify, if the follow code returns ANY rows, the job sends me an email saying that I need to check on things.

    SELECT * FROM msdb.dbo.suspect_pages;.

    You can get more information about what the table contains and why it would contain anything by reading the following MS article.  Note that DBCC CHECKDB also feeds this table and that if you end up with a lot of entries (normally an indication of something bad happening), then you'll need to manually maintain the table because the table is limited to only 1000 rows.  Here's wishing you a very consistent "(0 rows affected)" return on that query.

    https://docs.microsoft.com/en-us/sql/relational-databases/backup-restore/manage-the-suspect-pages-table-sql-server?view=sql-server-2017

    Of course, you should also check that all of your databases are using the CHECKSUM page verification option.  If the following code (which you should include in the job that checks for suspect pages) returns any rows, you should probably have it send you and email, as well.  I can't imagine why anyone would intentionally assign the "NONE" or "TORN PAGE" options although I've seen many 3rd party databases do so.  I set them to CheckSum once the vendor has had their way with them.


     SELECT *
       FROM sys.databases
      WHERE page_verify_option_desc <> 'CHECKSUM'
    ;

    Like I said, though, there's no substitute for DBCC CHECKDB.  When I do a test restore to the box I've designated such things for,  I also do a DBCC CHECKDB... I just don't do such things on the production box unless I have indications that I actually need to.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the information.
    I already have agent alerts for errors 823, 824, 825 and 832 but regularly checking suspect_pages is a good idea I will look at.
    My experience of CHECKDB on images in databases is that the tempdb write latency increases to about 3s instead of a good write latency recommended by Microsoft of no more than 20ms. It seems that with images in the database another server is required to run CHECKDB. I suppose some reports could be off-loaded onto the other server to justify the cost.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply