Best Practices for tables with VARBINARY(MAX)

  • Hi all,

    Was wondering whether anyone had any recommendations, or references to articles with recommendations, regarding best practices for tables with VARBINARY(MAX) data.

    My particular scenario is that I have a table which is storing photographs of receipts, in a VARBINARY(MAX) column. After a year's worth of data, the size of that table has reached a pretty significant level, and I'd imagine it will only continue to grow.

    As is obvious, if I run a SELECT * on the table, the result set takes quite a while to return. This isn't an issue, as all of my queries on that table are designed to return specific columns, and I'm only ever referencing the VARBINARY(MAX) column when specifically looking for a single record to return the receipt image from, and since it's a PRIMARY KEY lookup, the query runs very quickly.

    Are there any issues that I should be aware of as this dataset grows? Any suggestions / advice on things I should be doing? I was considering having some sort of partitioned table, with every year having it's own partition, but I'm not convinced that it's even worth bothering, since thus-far I haven't encountered any actual problems.

  • kramaswamy - Monday, May 8, 2017 11:25 AM

    Hi all,

    Was wondering whether anyone had any recommendations, or references to articles with recommendations, regarding best practices for tables with VARBINARY(MAX) data.

    My particular scenario is that I have a table which is storing photographs of receipts, in a VARBINARY(MAX) column. After a year's worth of data, the size of that table has reached a pretty significant level, and I'd imagine it will only continue to grow.

    As is obvious, if I run a SELECT * on the table, the result set takes quite a while to return. This isn't an issue, as all of my queries on that table are designed to return specific columns, and I'm only ever referencing the VARBINARY(MAX) column when specifically looking for a single record to return the receipt image from, and since it's a PRIMARY KEY lookup, the query runs very quickly.

    Are there any issues that I should be aware of as this dataset grows? Any suggestions / advice on things I should be doing? I was considering having some sort of partitioned table, with every year having it's own partition, but I'm not convinced that it's even worth bothering, since thus-far I haven't encountered any actual problems.

    1) Blobs do NOT belong on a production SQL Server in almost every scenario. They kill disk IO AND worse flush hot data out of the buffer pool, crushing all app perf on the server.

    2) Look up filestream. You can access that using direct apis from outside of SQL Server.

    3) I still prefer old-school disk files with pointers in the database. Yes, you need good processes and controls in place to make that viable. But you need those in virtually every other area of your database application too, so I don't see it as an issue.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Regarding point #1, I'm not really sure I follow. In what way do they affect performance, if they are not being retrieved by queries? Are you suggesting that just the presence of them being in the table has an effect? I mean, it's a requirement for these to be on the production server, as they need to be obtainable on-demand through the website.

    As for the idea of using FILESTREAM, I had considered it, but hadn't seen any convincing information regarding the advantages it has over using a BLOB. The thing I like about having it be in the BLOB, is that I can easily query it straight from the server if necessary, and I don't have to worry about portability issues if restoring the database onto another machine. Ditto for the idea of having disk files with pointers.

  • Modern cameras, with their immense pixel count,  can take pictures with tens of megabytes of size. That is a lot to put into the buffer pool. It could represent thousands of actual data records that would need to be flushed. to make room for it. Sure, getting one record could be fast. Do that for X records every Y seconds/minutes though and watch what happens to your physical IO demands.

    Sounds to me like you have done your research and have chosen your mechanism based off of that. If you are happy, I'm happy - even if you don't understand why your solution can cause performance problems (and cause backups to take longer and more size, and restores to take longer, etc).

    I do agree with you that partitioning will not be helpful for this scenario, unless you want to be able to more easily store old stuff on big slow disks.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply