How do I add files to the server?

  • Hello all,

    I would like to begin storing and displaying PDFs and images using SQL Server.  However, I'm not sure what I need to do to get it to work right.

    I hope I can use our current production server, create a DB just for the files, and create a table or two.  I'm guessing that I hope I can use our current production server, create a DB just for the files, and create a table or two.  I'm guessing that

    varbinary(max)

    is the data type, then perhaps a file path column as well?

    I've been reading up on

    FILESTREAM and FileTable

    but am not quite sure how to implement them, or even if I really need them.  Any advice is much appreciated.

    Mike

  • Strong, strong suggestion.

    Don't.

    Can you store files within SQL Server? Yes.

    Is SQL Server good at storing and retrieving files? No.

    However, if you're going to insist on doing it, then I'd focus on the FILESTREAM data type. That's the more efficient mechanism (among less than efficient mechanisms) for doing this within SQL Server. Here's an example on managing the necessary filegroup for FILESTREAM use. There are also a number of articles (mostly older, this feature doesn't get much love) available here on SSC.

    I'd either simply use the file system to store the files and keep a pointer to that within SQL Server (URL, path, whatever is needed), or look to a different data store mechanism that is better suited to file management. That's $.015.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I would have to agree with Grant here - SQL doesn't work great with storing files. It can do it, but you run into snags around performance.

    My question to you would be "why?". Why are you wanting to store the files in SQL rather than in a file share (SMB for example)? Putting them in SQL means you need some custom application built to import and export the files. If it is for storing the files in a secure way to prevent people from modifying them, SMB can handle that (give users read-only access to the files in the folder). If it is for restricting access to the files, again SMB can handle that. Cross platform access to the files, again SMB. If it is for sharing files outside your company, then I'd recommend an industry best practice method such as SFTP.

    Now I know there are some special cases where you need to use SQL to store the files. I have a scenario at my workplace where we encrypt very specific files then store them in SQL in VARBINARY(MAX) columns, but we have a very specific scenario where the files must be stored in SQL (contractual obligations and 3rd party software requirements) and all of the stored procedures that access or modify these files starts with a logging step and ends with a logging step as we need a full audit trail for access to these. But in most cases, storing documents in SQL is just unnecessary overhead and technical debt that will come back to bite you in the future when you discover that the performance sucks.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • On top of the technical reasons not to that have already been mentioned is a business reason:  SQL Server is frequently your most expensive storage.

  • Also, one thing to REALLY think about is backup and recovery. Storing files in the database means restoring those files with the database. That becomes a huge headache and can seriously impact your recovery time objectives.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey wrote:

    Also, one thing to REALLY think about is backup and recovery. Storing files in the database means restoring those files with the database. That becomes a huge headache and can seriously impact your recovery time objectives.

    I was going to add this comment too - but also add that it always ends up turning into an issue of creating new filegroups and files, setting older filegroups to read only and modifying backups to backup only the read write filegroups.

    It's either that or end up backing up 20TB of images and 1TB of actual data...

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • This was removed by the editor as SPAM

  • I have some experience in this area and I would advise against storing files in the database - using FILESTREAM or otherwise.  This blog post might be worth a read where I go into some more detail on this.

    DBA Dash - Free, open source monitoring for SQL Server

  • I'll likely be alone on my take on this but "It Depends" on a whole lot of things.

    When I first started at a company that I do work for, they had a couple of years worth of call recordings stored in the database as binaries (LOBs, to be sure).  Of course, my previous "training" reminding me of what a "Bozo no-no" that was.  And so I was set on the task of proving to management that there were much better ways.

    My first step was a bit of due diligence.  Each row in the Call Recordings table no only carried the binary of the full recording but also had a full path name to the actual file for the recording.  YEEEE-HAAAA!!!! Right?

    Yeah... no.  10% of the files were missing and 10% of the dozens that I checked were corrupt in one fashion or another.  The binaries for those same files that were stored in the database suffered zero corruption.  I checked with management and, of course, they rightly said that having 10% of the recordings missing and 10% being corrupted was not acceptable in the least.  And there were no "old" system-level backups to be had.

    I had considered rewriting the binaries back out to files but that would be yet another duplication of data for the 10% that were missing and only one application was using the binaries from the database.

    They had setup the database to do full backups every-night and log file backups every hour (and, of course, I changed the latter almost immediately).  It was taking about 2 hours to do a full backup.  It took me about 2 years to persuade folks of a better way (having already totally discarding the idea of only having files and no recordings in the database) and the backups had grown to about 5 hours.  Compression of the binaries was not possible because the were already using a compressed form of WAV files with the compression being done on the phone system before the files were saved to "permanent" disk and imported into the database.

    To make a much longer story, I drank the Purple "Kool-Aid" and partitioned the call recording table in a separate database... 1 file per file group... 1 file group per month.  Backups for the main database dropped to about 6 minutes and the "average" for the "current recording month" went from near zero at the beginning of the month to about 20 minutes at the end of the month.  I managed to bring even those down quite a bit (about 10 minutes or so) by tweaking the backup command settings for buffers and transfer size.

    The final rebuild of the "current month" for the call recordings was (and still is) automated as well as setting the file group to "Read Only" and doing the final backup for the file group.  Of course, so was the creation of the "next month" file group as the previous "next month" filegroup auto-magically became the "current month" thanks to the date-wise partitioning I did.

    I previously said that I drank the Purple "Kool-Aid" because I used the MS and general "Best Practice" of using "Partitioned Tables" rather than "Partitioned Views".  The trouble with doing DR restores on Partitioned Tables is that you can "get back in business" pretty quickly by doing file/filegroup level restores but you can't actually do backups until all partitions have been restored.  That's not an issue with Partitioned Views, especially if every filegroup lives in its own database.

    It's been more than 12 years since I first encountered all of that and it's been running perfectly since the partitioning and with absolutely no missing or corrupt recordings.  And it cost us nothing extra because MS charges by core, not by file size.

    My point is, I used to preach what everyone else still preaches about storing such things in a database.  I'm here to tell you that you need to honestly and openly re-evaluate that stance because, for us, it was absolutely the wrong hill to die on.

    As with so many supposed "Best Practices", just because a million other people are of the same opinion, it doesn't mean they're right.  In a lot of cases, they're flat out wrong.

    If you really want to have some fun, ask me about the more than 2 decade old supposed "Best Practice" Index Maintenance methods that most of the world (and me previously) still uses even though MS changed the documentation on all that,which is still mostly incorrect as I've proven in other presentations, especially when it comes to the indiscriminate us of REORGANIZE being used based solely on levels of Logical Fragmentation.

    And, always remember...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This was removed by the editor as SPAM

  • While I agree with Jeff in the sense that "if the system is designed in one way, why change it to work a different way", I think I would not take "database as a file storage system" as my first choice when designing a new thing.

    Filesystem storage CAN work great for long term storage of files. Alternately, if you are worried about file storage size, disk is pretty cheap in the cloud right now, OR if you are just looking to archive data with little intent to review it, there are tons of good archiving options you can do. Dump it to long term storage on tape for example. If you are worried about privacy of the files when you toss them on the cloud, encrypt them before you upload.

    But storing binary data in a database has a lot of overhead in my opinion. If I am remembering correctly, Filestream is really just dumping the file data to disk with GUIDs for the names and then has tables inside SQL to map the GUID to an actual filename. Still needs someone with SQL knowledge to upload/download the files OR a custom app. I would much rather store my data on a SAN that has the disks in a RAID like setup so if any bits get flipped, it can self-heal and my data is good.

    And jumping back to Jeff's example, if 10% of the data was missing and 10% was corrupt, was anybody reviewing the content? If I am creating a data graveyard or data landfill (whatever you want to call it where people go to drop data that nobody will ever look at), what's the point? If the data needs to reside long term, I'd be looking at some archival storage solutions like tape storage (good for decades) or M-DISC (claims to be good for 100's or 1000's of years). Short term, put it on HDD then migrate it to tape storage monthly. I wouldn't waste precious SQL resources or even SSD storage for landfill data...

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Grant,

    Ok, so I want to store the files in the Windows file system.  As for using a pointer to the files themselves, that would be the fully qualified file path, correct?  And, since they are all going to be stored in the same parent folder, separated into folders for PDFs, PNGs, etc., I could somehow store the path name in a table for, say, application settings, then just store the file name in the "file table", and I'm thinking along with a tag for the file type.

    Brian, as for handling file movement, I will have to come up with some custom form to allow a user to upload a file and attach it to a machine, but for now I think I'll just have them send the files to me and I'll upload manually.

    So, by doing it this way, I don't need FILESTREAM to retrieve the files from the file system or do I?  I mean it's not like MS Access where I can build an FSO dialog box and get the file that way.

  • As Jeff says, I've learned not to argue with him, you can do it all in SQL Server. I've had bad experiences though.

    So, yeah, you could do any combination of what you're describing as a way to avoid simply storing the files in the DB.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I wouldn't store the fully qualified path for each file.  The files will probably need to be moved to a different location at some point.  I'd store a relative reference and have the rest of the path as app config.  You could just use the ID generated from the DB as the file name on disk and avoid storing any relative path.  This would avoid any conflicts with file names.  If you are doing this you are storing files outside the DB so there is no FILESTREAM.

    My advice on not storing files in the DB is just that - advice. With any advice on the internet - take your own use case into consideration and make an informed decision based on what facts apply to your situation.  I go into more detail in my blog post about specific issues with storing files in the DB.  Most of the issues are related to data volume - if you are not storing huge volumes of data it's much less of an issue.  But it's worth pausing and making an informed decision. My preference for storing files would be an object store like S3 or azure blob.

    If it's a small internal app you are developing it might be worth looking to see if there is an off-the-shelf solution that might work.

    DBA Dash - Free, open source monitoring for SQL Server

  • To add another 2 cents, I totally agree with David Wiseman.  Storing the full path instead of just a file name can be a horrible idea for the very reasons he stated.  We do, in fact, use his method of storing just the file name (sometimes with a "prefix path") and include the "base path" as an easy to change configuration item in a configuration table.

    Speaking of that and as I think I may have said (could be wrong and not looking back to find out), I've never used FILESTREAM.  I have, however, read of some significant advantages including but not limited to being able to avoid "taking heat" for storing binaries in a database, not to mention that you would move the "file reads" to the Windows Operating System.  As others have also stated, they have had some bad experiences with storing such binaries in the database (and I take that as strong truth from Grant... that's Mr. Experience talking right there).

    If you're like me and haven't used FILESTREAM before, do what I would do... setup a test database and let the learning being.  It took me about two seconds on Google to find the essential place to start (I used FILESTREAM SQL SERVER)...

    https://learn.microsoft.com/en-us/sql/relational-databases/blob/filestream-sql-server

    Up next and on the same search -results page...

    https://learn.microsoft.com/en-us/sql/relational-databases/blob/enable-and-configure-filestream

    ... and then, likewise for the next logical step...

    https://learn.microsoft.com/en-us/sql/relational-databases/blob/create-a-filestream-enabled-database

    ... and then I'll leave the rest up to you.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply