The FILESTREAM Data Type in SQL Server 2008

  • Good article, but a couple of clarifications are in order:

    First, FILESTREAM is a storage attribute on a varbinary(max) datatype column. It is not a new datatype. Seems like a trivial difference, but it has important implications. Primarily, client code doesn't have to know anything about FILESTREAM. to a TSQL client, it's just another varbinary(max) column. The ONLY difference a client would notice is that the blobs can exceed 2GB.

    Second, there is a supported way to access the individual files UNDER SQL CONTROL. Using the column.GETPATH() function, you get a logical path (not in the physical filesystem) which you use to open the file by calling OpenSqlFilestream. This gets you an open file handle which is enlisted into the current SQL transaction. Any modifications are done as part of that transaction, triggers fire when you close the file, etc.

    The physical files are ACLd to the SQL service account only, so unless you're running as that account, or you are a box admin walking on file security, you can't access the files outside of SQL control.


    Kevin Farlee
    SQL Server Storage Engine PM

  • Kevin:

    Is Deepa's statement true that if somebody accesses/modifies the file bypassing SQL Server, it may render the file corrupt?

    venki

  • Scott Roberts (9/24/2008)


    Nice article, but I have a couple of questions/suggestions:

    1. Are there any tests demonstrating the performance difference between FILESTREAM and traditional VARBINARY(MAX) columns? I've seen mention of varbinary being cumbersome in SQL2000 and older, but I was under the impression that it was much better in SQL2005.

    We have done extensive perf testing. Generally for blobs under 256K to 1 MB it's quicker to access via TSQL. For larger blobs it's quicker to access the file directly.

    2. Files can be accessed through the file system, but is this recommended? For example, would it be acceptable for a web site to directly access FILESTREAM files on the file system? Or are they just there in case someone wants to manually eye-ball one of them?

    The files can only be accessed via SQL functions, and under the control of SQL Server. There is a filter driver which verifies that the identity attempting the open has access to that cell in the database. Also, the access happens in the context of the current SQL transaction.

    3. How can I convert existing varbinary(max) columns to use FILESTREAM? Will a simple ALTER TABLE do it?

    No. Since it requires moving all of the data, you need to ALTER TABLE ADD COLUMN, copy the data, and then delete the old column.

    4. What is the *PRIMARY* motivation for FILESTREAM? Is it for performance, or for accessing DB files from the file system?

    Performance for accessing large blobs of data (think streaming video out of the DB)

    Thanks again for taking the time to write the article. It was very informative and obviously got me thinking.

    Scott


    Kevin Farlee
    SQL Server Storage Engine PM

  • the_rs (9/24/2008)


    Kevin:

    Is Deepa's statement true that if somebody accesses/modifies the file bypassing SQL Server, it may render the file corrupt?

    venki

    First, re-read my post about direct file access. It is ONLY allowed under the control of SQL Server. So nobody can access the file outside of SQL's control. Changes are logged and can be rolled back before you commit the transaction, even after you close the file handle. If someone has legitimate access to that cell in the database, and is putting a new value in the FILESTREAM column, then you get whatever they put in, EXACTLY THE SAME as if they put that same value in a Varbinary(max) column. The main difference is that now after you pass the SQL security checks, you can get the data into the column much more efficiently using the file handle.

    Kevin


    Kevin Farlee
    SQL Server Storage Engine PM

  • Kevin Farlee (9/24/2008)


    We have done extensive perf testing. Generally for blobs under 256K to 1 MB it's quicker to access via TSQL. For larger blobs it's quicker to access the file directly.

    [snip]

    The files can only be accessed via SQL functions, and under the control of SQL Server. There is a filter driver which verifies that the identity attempting the open has access to that cell in the database. Also, the access happens in the context of the current SQL transaction.

    Hi Kevin,

    Thanks for responding. One more point of clarification if you don't mind.

    When you say "For larger blobs it's quicker to access the file directly" does that mean "For larger blobs it's quicker to access the file directly via SQL functions, and under the control of SQL Server" (presumably using "column.GETPATH()" and "OpenSqlFilestream")?

    Thanks again,

    Scott

    Edit: I think your previous post answers my question.

  • Yes.

    For larger blobs it's more efficient to use the GetPath() and Open SqlFilestream() functions to get an open file handle and use that to access the data under SQL control. For smaller blobs the overhead of opening and closing the file overwhelms the faster data transfer.


    Kevin Farlee
    SQL Server Storage Engine PM

  • SQL SERVER has provided a way to handle files and backup - recovery problems.

    Also it is recommended that this should be used when file size is above 1 MB.

  • Designers of document storage and mgmt apps (with large document objects) had to decide where to store the objects. These included document management systems like LiveLink, and more recentlySharepointt.

    As internet bandwidth increases, applications that use large files are expanding, including video and music files.

    LiveLink, as a document management system supported both (either) the storing of documents in the database as blobs, or in external files. As some of the earlier comments noted, large blobs were slow, so most installations used external files. That allowed the database, which then only held meta-data, be be fairly small, fast, and to only require a modest server.

    Naturally, maintaining the correct linkage between the external files and the internal database records added some additional overhead. But not really any more than the double-linked system the database engine uses to maintain page consistency.

    Filestream gives you an option, but like all options, only use it if it makes sense, and meets the business objectives. My guess is that for many cases, storing files completely outside of the database (the current approach), and only storing the metadata in the database will tend to be the best approach.

    The more you are prepared, the less you need it.

  • Andrew:

    I agree with you that from a performance standpoint, storing these files in the native file system is the better option.

    But then, data recoverability is an issue. I had worked on a project that managed cases in XML and the XML files were stored in Documentum. We had to minimize data loss (once-a-day backup was not acceptable) and you had buy a pricey third-party software to ensure that data loss was restricted to 15 minutes in case of data corruption/system outage etc.

    Venki

  • Yes, it really gets down to understanding both the requirements and the tools.

    I see many troubled/problem applications/systems where lack of storage knowledge was the biggest problem with the system. Storage being both database and overall file system.

    The more you are prepared, the less you need it.

  • Excellent article! Very well written and most of the advice in the comments are spot on.

    I have a slightly off-topic question. We are currently still on SQL Server 2005. We will not move to SQL Server 2008 until at least Q2 2009.

    I have to implement a system on SQL Server 2005 that basically mimics a document management system.

    Do I save the BLOBs to a MAX field in the database or do I save the BLOB to the File System? Which method will be the easiest/less work to convert to FILESTREAM?

    Thank you in advance.

  • Can I use queries with FREETEXT, CONTAINS, NEAR etc. with data(content of text files or doc files) stored using sql filestream?

  • Great article! Have a question for you... Can I attach a file saved on a Filestream column to an email sent with sp_send_dbmail?

    Any insight will be greatly appreciated!

    -M

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply