conversion of varbinary(max) to filestream

  • What are the best steps to convert varbinary data to filestream? Right now we don't have it turn on and the table that holds our data has a field for the BLOBs varbinary(max). We would like to take this BLOBs from the database to the file sytem using filestream. We also have another table that contains links to files to a network drive. We would also like to take this files to the file system using also filestream. We have around 300,000 records that contain the varbinary files in the database, on average 100 KB per row in total 26GB.

    Thanks.

  • There are an awful lot of ways to take data from a table and store it in a file, but the critical question is why as opposed to how. How is the data intended to be used? Answer that and you are much closer to your goal. Also, given the 100KB average size per record, just putting the data in the file system may NOT make things any faster, and again, whether or not you see an improvement in performance is going to depend heavily on how the data gets used, along with such file system performance factors as number of concurrent accesses, disk fragmentation level, free space level on the disk, etc...

    Can you elaborate more on your objective?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thanks for the response. Yes I definitely understand. I'll try to explain. I believe our main goal is to be consistent with our data. Right now we have a combination of data some stored in the network drive and some stored in the database. We would like to place all the files in one place, either the database or the filesystem. Since we are starting to add more and more documents to our database and the file size will considerable grow, we are inclining to chose storing the files in the file system using filestream. The purpose of this data is to have these files available for our online application.

    With our current setup, we can maintain consistency with the files stored in the database, when we delete a row in the database and the file is deleted as well. However, for the files that reside in the network deleting a record will leave the file orphaned unless the user is consistent and deletes the file as well (which I know that won't happen the majority of the time).

    So far our current solution have satisfied our needs, however, our business is growing and the need to store more and bigger file types is growing. Turning on filestream seems to be the right way to go at this point considering what we are expecting to come within the short and long terms.

  • Okay... consistency is usually a good thing, but you still haven't really said anything about the files themselves, and how they are used. Are we talking about a web-based app that reads the data from the database and displays it in the proper format, or a compiled app reading the database, or ???

    I'm asking because I wonder if the overhead of having filestream in place will have much benefit, as there are lots of easy ways; even if you do no more than store the filename in the database AND guarantee unique filenames; to "clean up" on a regular basis. A simple VBScript that runs from a SQL Agent job could accomplish the removal of "orphans".

    However, back to the more important task - how, exactly, are the files used? Saving the data in a BLOB when the original document is a .docx or an .xlsx or perhaps any of a dozen formats, as opposed to say, all spreadsheets, or all Word docs, is always a conversion, and that conversion takes time. Perhaps not as much with filestream, but the real key is how the file gets used. Maybe all any user needs is a link they can click on to get to the file, in which case filestream might well be overkill. On the other hand, you might be doing something else altogether. It should even be easy enough for an app to delete a file the user indcates isn't needed any more, so that's why I'm asking. What are you doing with these files that is important enough to consider filestream as a solution?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Grrr decisions decisions. thank you for your time.

    The files are use in a web application. The application has an interface that searches the database and based on that information it displays the link to the pertinent documentation. The link points to the documents either stored in the database or the file system. Most of this information is for review only. the user does the search and wants to review the contracts for example, the plats that belong to that contract, and any other files store there. The files could be ppt, xls, doc, tiff, jpeg, etc. Most of the time when the files are retrieved are for read only not for modification. Also, there's a constant data entry in the database that requires to upload the files that relates to those records.

    I hope this time I gave enough information. Thank you so much again for you time. This seems to be a very big project and I want to make sure I make an informed decision before anything else.

  • Take a look at the following:

    http://technet.microsoft.com/en-us/library/bb933993(v=sql.105).aspx

    It does a better job of explaining FILESTREAM than I can. There are a number of things that could be problems - such as no encryption, so keep that in mind.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Sounds like based on what we have and what we are intending to do, storing the files in the database is the way to go. I have a script to import the files that are residing on the network drive into our database.

    Thank you for help.

  • Glad I could help. You may want to do some quick analysis on the files in the network share and compute average filesize, as the recommendation at that link indicates a negative when your use case would average less than 1 MB per file. How much a negative that would be is not known to me, but I suspect it has to do with space allocation, which might vastly exceed filesize if the average isn't up to the 1 MEG level. So it might mean that storage use could go up quite a bit without any actual change in the data. This can at least be something you can track over time so that you can keep some metrics on it.

    Enjoy !!!

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply