Storing a Binary File in a Database

  • Ryan C. Price

    Ten Centuries

    Points: 1182

    don't understand why people think they need to write a CLR for this

    and

    Why not use native T-SQL commands?

    Don't know that you'd want to, but maybe mass loading a whole bunch of files, something like:

    INSERT FileDataTable(FileName, FileData)

    SELECT FileName, clr.GetFileData(FileName) FROM FileNameTable

    or UPDATE FileTable SET FileData = clr.GetFileData(FileName)

    ? (I'm not sure this would even be possible - it's been a while since I personally looked into CLR functions?)

    Re: the discussion about to FILESTREAM or not to FILESTREAM. What Jeff said:

    For those cases where you're not going to violate the 2GB limit, you don't even need FileStream

    Why would you want to complicate your life by introducing FILESTREAM if you don't need it? I suspect the scenarios where people are wanting to store > 2GB files are several orders of magnitude smaller than those scenarios where people are storing files documents and images < 2GB in size.

    I know some people have an aversion to storing large blobs inside a SQL database, but I've never really understood it; you still need the same about of space for your data, and for the projects I've been involved with (a very small sample, I admit) nobody has yet commented about the performance.

  • fregatepllada

    SSCommitted

    Points: 1648

    Ryan, IMHO Filestream does not complicate life - it provides a dual view on BLOBS (the best of 2 worlds) when BLOB could be stored as a file but database engine is aware of it presence. Unfortunately technology is not a mainstream even in SQL Server community 🙁

  • Ryan C. Price

    Ten Centuries

    Points: 1182

    fregatepllada (5/26/2015)


    Ryan, IMHO Filestream does not complicate life - it provides a dual view on BLOBS (the best of 2 worlds) when BLOB could be stored as a file but database engine is aware of it presence. Unfortunately technology is not a mainstream even in SQL Server community 🙁

    That's awesome if you need to see the data from both views, but the applications that *I* have worked with don't have that need, so providing it would be adding unnecessary complexity. i realize that it's not a *lot* of complexity, but it nonetheless involves extra configuration and maintenance which my applications don't need (and I'm sure there are many in the same situation as me).

  • fregatepllada

    SSCommitted

    Points: 1648

    I do not claim that this is a next silver bullet, from my experience I simply found this capability more useful comparing with "classical" approach. When it came to your project design decisions obviously it's you call what to use, but there are few options (aka multiple ways to skin the same cat). I found FILESTREAM useful for document management system implementation last year 🙂

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    fregatepllada (5/26/2015)


    FileTable does not support transactions,

    Sure it does...

    while Filestream does, plus in Filestream you have a choice to hide files from Windows 🙂

    You can hide the files in Windows with FileTables as well using the same NON_TRANSACTED_ACCESS filestream option you would use when enabling a regular FILESTREAM setup.

    FileTables is built atop FILESTREAM technology. You can think of FileTables as an advanced API sitting on top of FILESTREAM technology. There are some tricky differences so FileTables are not a full replacement for stand-alone columns configured as FILESTREAMs but I look at FileTables first then go to the lower-level FILESTREAM second if I need more control.

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    mwpowellhtx (5/26/2015)


    Orlando Colamatteo (5/26/2015)


    Thanks for taking the time to write the article. Given the alternatives I strongly prefer employing a SQLCLR for implementing file system access from T-SQL. However, in this case a custom solution for handling binaries may not be necessary. I see this as a great use case for FileTables. Unlike the C# solution in the article, the native T-SQL solution alluded to and using FILESTREAM directly, no coding is required to bring a binary into the database when using FileTables. FileTables leverage FILESTREAM but with the added advantage of offering us the option of managing files in the database via a UNC path accessible through Windows Explorer or other standard file-system tools.

    FileTables

    I'm not sure you understand FILESTREAM correctly. I may not either, but as I understand it, the OP's API would still be necessary. The only difference being, FILESTREAM permits large files, stored in the file system to be sure, transparently through the table. They still look and feel like a varbinary(max), but to be sure, the files are landing in a predetermined location. They are also very transactional in nature; that's one of the benefits of FILESTREAM; as contrasted with a RYO file tabling / streaming strategy.

    I have researched this area quite a bit so I think you may have misread my comment. I was not offering up FILESTREAM to eliminate the need for the OPs API, I was offering up FileTables as a possible replacement for the OPs API. Even with FILESTREAM the OPs API could go away by leveraging Windows APIs but I do not want to go too far off topic.

    See the bolded statement in my original comment above. With FileTables you have the option to access the FILESTREAM directly but it is not compulsory if you do not need transactional data access, i.e. you can get files into the database without ever connecting to the database engine itself.

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    Jeff Moden (5/26/2015)


    For those cases where you're not going to violate the 2GB limit, you don't even need FileStream to load binaries.

    True, but with in-table storage the cost per GB will generally be higher, streaming performance is poor relative to other methods and recovery from fragmentation is difficult. This is not to mention that if your binaries are large you can be blasting your buffer pool moving files into and out of your database whereas with FILESTREAM and FileTables data is streamed outside the buffer pool.

    For small binaries or even medium sized binaries that are not handled much once they are stored I think in-table storage makes a lot of sense but there are definitely use cases where the other methods are going to be a much better choice.

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Jeff Moden

    SSC Guru

    Points: 996676

    Orlando Colamatteo (5/26/2015)


    Jeff Moden (5/26/2015)


    For those cases where you're not going to violate the 2GB limit, you don't even need FileStream to load binaries.

    True, but with in-table storage the cost per GB will generally be higher, streaming performance is poor relative to other methods and recovery from fragmentation is difficult. This is not to mention that if your binaries are large you can be blasting your buffer pool moving files into and out of your database whereas with FILESTREAM and FileTables data is streamed outside the buffer pool.

    For small binaries or even medium sized binaries that are not handled much once they are stored I think in-table storage makes a lot of sense but there are definitely use cases where the other methods are going to be a much better choice.

    Yep... I realize that some of that can be a problem. 😀 Most of the files that I have to import are just several meg of compressed telephone conversation. The problem that I had was that all this stuff was being stored in a table when I got there and it's on a 2005 box that's not going to be upgraded for another year (heh... it WAS on an SQL 7 box). They also included the path to the original file.

    When I did the due-diligence to ensure that all the files existed, 10% of more than (at the time) years of recordings were missing (we're now required to keep them forever or until a client leaves us) and the other 10% were corrupt in one form or another. I could write it all out to new files but I decided that the recordings are safer in the database, nicely partitioned (after I got done with it), set to READ-ONLY, and safe. Heh... no one guards data like a DBA.

    I'm going to have to dig into FileStream and FileTables soon. We're finally migrating to 2012.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sanjay.menon

    SSC Rookie

    Points: 41

    am i seeing a bug in GetBytesFromFile procedure it is not passing the @FileName parameter to the

    SqlFileProcessor.FileProcessor.GetBytesFromFile dll

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Lots of different approaches. Thanks for yours.

  • Orlando Colamatteo

    SSC Guru

    Points: 182269

    Jeff Moden (5/27/2015)


    Orlando Colamatteo (5/26/2015)


    Jeff Moden (5/26/2015)


    For those cases where you're not going to violate the 2GB limit, you don't even need FileStream to load binaries.

    True, but with in-table storage the cost per GB will generally be higher, streaming performance is poor relative to other methods and recovery from fragmentation is difficult. This is not to mention that if your binaries are large you can be blasting your buffer pool moving files into and out of your database whereas with FILESTREAM and FileTables data is streamed outside the buffer pool.

    For small binaries or even medium sized binaries that are not handled much once they are stored I think in-table storage makes a lot of sense but there are definitely use cases where the other methods are going to be a much better choice.

    Yep... I realize that some of that can be a problem. 😀 Most of the files that I have to import are just several meg of compressed telephone conversation. The problem that I had was that all this stuff was being stored in a table when I got there and it's on a 2005 box that's not going to be upgraded for another year (heh... it WAS on an SQL 7 box). They also included the path to the original file.

    When I did the due-diligence to ensure that all the files existed, 10% of more than (at the time) years of recordings were missing (we're now required to keep them forever or until a client leaves us) and the other 10% were corrupt in one form or another. I could write it all out to new files but I decided that the recordings are safer in the database, nicely partitioned (after I got done with it), set to READ-ONLY, and safe. Heh... no one guards data like a DBA.

    I'm going to have to dig into FileStream and FileTables soon. We're finally migrating to 2012.

    Wow, something newer than SQL vNextMinus4, exciting 😀 In all seriousness, it seems like the petition we had going to get your company to upgrade so you could have your hands on some newer SQL Server features more regularly finally gained enough traction.

    From the information you shared in-table storage very well could continue to be the right choice even with these other options available, especially since it seems to be serving the system well.

    __________________________________________________________________________________________________
    There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

  • Solomon Rutzky

    SSCoach

    Points: 16259

    Hi Eric. Good topic. I definitely like the concept, but need to mention a few notes regarding SQLCLR best practices:

    • !! Critical Issue !!: You open a FileStream but you don't have any error handling or even a "using" block. This is very dangerous as external resources (i.e. file handles, network handles, etc) can get orphaned if not properly disposed and can be held open at least until the App Domain is unloaded. So, the File.Open() through the File.Read() needs to either be in a try / catch / finally (where you call fs.Dispose() in the finally block), or at least in a "using" block (which is really just short-hand for the compiler doing the try / catch / finally for you). And then you don't need to call fs.Close().

      Until the code in the article is updated to trap errors to ensure proper disposal, I would have to recommend against anyone using this in a Production capacity.

    • Why are you chopping off the last byte of the file?
    • Why are you both setting the database to TRUSTWORTHY ON and granting yourself EXTERNAL ACCESS ASSEMBLY? Neither of those things are necessary. And in fact, setting TRUSTWORTHY ON is usually undesirable. If you sign the Assembly and give a password to the key file (leaving you with a .pfx file), then you can just create an Asymmetric Key in the [master] database from your Assembly DLL, then you can create a Login based on that Asymmetric Key, and then you grant that Login EXTERNAL ACCESS ASSEMBLY. Then you can load the Assembly as EXTERNAL_ACCESS without needing to set the database to TRUSTWORTHY ON.
    • Why would you copy the DLL to “C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL”? I don't see any benefit to that as the DLL isn't needed after it is imported into SQL Server.
    • I'm pretty sure that you don't need the "WITH EXECUTE AS 'SBHR1\enotheisen'" option since the code should be accessing the external resource as the Log On Account for the MSSQLSERVER (or whatever) service unless you handle the impersonation within the CLR code. Of course, if you are testing on SQL Server Express LocalDB, that code will access external resources as you since it runs in user space and not as a background system process.
    • I think you are missing some options/flags in your csc command line. I let Visual Studio handle all of that for me so I don't even know what they all are, but at the very least there should be one for optimizing the code.

    I hope all of this helps. Take care, Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky

    SSCoach

    Points: 16259

    sanjarani (5/26/2015)


    I used .dll program developed with C# a few years a go (SQL Server 2005) to store binary files, mainly .jpg file.

    At the time it was said that a special permission was necessary to consume .dll in SQL Server and those permissions would compromise the security of the Server.

    Is consuming .dll in SQL Server still a security concern?

    Hi sanjarani. No, it isn't a security concern, not now nor was it even back when you were told that it was. The problem is that there is a general lack of understanding around SQLCLR. But SQLCLR is far better than the alternatives of xp_cmdshell, the OLE Automation Procedures, extended stored procedures, and sometimes even OPENROWSET / OPENQUERY.

    If you would like to see for yourself how the various aspects of SQLCLR security work, check out the following two articles of mine:

    Stairway to SQLCLR Level 3: Security (General and SAFE Assemblies)[/url]

    Stairway to SQLCLR Level 4: Security (EXTERNAL and UNSAFE Assemblies)[/url]

    Take care,

    Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Jeff Moden

    SSC Guru

    Points: 996676

    Solomon Rutzky (5/28/2015)


    But SQLCLR is far better than the alternatives of xp_cmdshell, the OLE Automation Procedures, extended stored procedures, and sometimes even OPENROWSET / OPENQUERY

    Extremely personal and grossly incorrect opinion noted. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • dmbaker

    SSCertifiable

    Points: 5044

    Solomon Rutzky (5/28/2015)


    Hi Eric. Good topic. I definitely like the concept, but need to mention a few notes regarding SQLCLR best practices:

    • !! Critical Issue !!: You open a FileStream but you don't have any error handling or even a "using" block. This is very dangerous as external resources (i.e. file handles, network handles, etc) can get orphaned if not properly disposed and can be held open at least until the App Domain is unloaded. So, the File.Open() through the File.Read() needs to either be in a try / catch / finally (where you call fs.Dispose() in the finally block), or at least in a "using" block (which is really just short-hand for the compiler doing the try / catch / finally for you). And then you don't need to call fs.Close().

      Strongly agree with this, exception handling is a must when dealing with the file system. Regarding need to dispose, maybe you could just "dispose" with all of that (and a few lines of code) by just using the File.ReadAllBytes method instead (if you don't mind the file being locked for reading). Something like (sorry, code written off the top of my head):

      public static SqlBytes GetBytesFromFile(string sFilePath)

      {

      SqlBytes b = null;

      try

      {

      byte[] abytBuffer = System.IO.File.ReadAllBytes(sFilePath);

      b = new SqlBytes(abytBuffer);

      }

      catch (System.IO.IOException ex)

      {

      // Do something with/about the exception

      }

      return b;

      }

    Viewing 15 posts - 16 through 30 (of 37 total)

    You must be logged in to reply to this topic. Login to reply