Removal of BLOBs from the database

  • Hello all, I'm after a bit of advice.

    I currently have a lot of XML blobs being stored in several of the production databases that I look after. They're there as a application "logging" feature, so that the developers can investigate issues if they ever occur.

    I want to get these out of my OLTP databases asap and am currently looking at options. I know Brent Ozar recently posted an article about Content Addressable Storage systems http://www.brentozar.com/archive/2015/03/no-more-blobs/

    Is anyone here using that technology? If so, I'd like to hear of any pitfalls that you've encountered.

    Or is anyone using filestream for this? Again, what are the pitfalls? How do you handle point in time recovery?

    Any other ways of storing this data?

    Thanks in advance, Andrew.

  • The main challenge I'm facing with these BLOBs is that currently they go into a partitioned table (by day) and are stored for three months. After that, a stored procedure switches the partition into a empty table and then truncates them.

    However, one of the databases that holds these blobs is being moved into a "PCI zone", in which we will only have a standard edition instance. This means that unless I can remove the blobs from the database, I'll have to batch delete the data from the tables which will be a nightmare from a performance perspective.

    If I can get these things out of the db it'll make life much easier.

  • Filestream will help. SHould allow you to manage the performance aspect of deletions better. Make sure you kill 8.3 naming in the folder.

    This works on Standard, and the code changes should be minimal in that you can insert to a table just like normal T-SQL. You can get better performance with a C#/VB.NET app grabbing a handle for the insert, but that's a decent amount of work.

  • Thanks for the reply Steve. Are there any issues around performing point in time restores with filestream? I've read a couple of articles online telling me to be wary.

  • That's interesting. Haven't heard of PIT issues, but haven't tested. I do know the GC can lag, but worst case might not be an issue for you since it doesn't seem you are high performance, high transaction volumes.

    I'd certainly test with your workload. Try making some new docs, log backup, add docs, another log.

    Restore to a new db and see if you can do a PIT.

  • Nail on the head right there, test, test and test again.

    Thanks very much!

  • No one out there using a Content Addressable Storage system for blobs?

  • I'll drop a note on Twitter, but I doubt it. All the CAS systems I've seen in the past were very, very expensive. As in $100k/appliance. Not something many people would use.

    Might be better choices now, but in general, specialized storage hasn't been cheap. Unless you have large volumes and it causes perf or management issues on a SAN.

    What's your scale of blobs?

  • Thanks again Steve.

    Most of the applications that we have log XML blobs into their respective database(s), the worst example has over 1TB of data stored as XML.

    Andrew

  • 1TB, but how big on average? The average size matters more for performance and issues.

    If these aren't read often, I might also try to compress them.

  • Sorry for the late reply Steve. It's been a three day weekend over here in Ireland. 😀

    The tables holding the data are ~200-300GB on average I'd say. Not huge but enough to start causing problems.

    Compression is an option for our OLTP instance, however we'll soon have a standard edition instance which will be storing similar amounts of XML data.

  • Thanks, and no worries.

    Compression could still help you, but you'd have to have the clients do it if you move to standard edition. Not sure it's worth the effort there, but it can be a fairly simple method call from .NET if this is a large data issue. The downside is you can't easily read the data from SQL.

  • I've been looking at filestream. From what I can see we get a good payoff with SELECTs hitting the table but pay a small penalty with inserts and deletes (which makes sense).

    Going to play around with it a little more, stick compression on as well and see what I get.

  • From how you described it I may assume that those BLOBs are totally irrelevant to user data, have no influence on user activity and may only be need for the system maintenance purposes.

    If it's correct than the data logically does not belong to the database at all. And you don't need to back it up to any other instance, as the recorded errors are not relevant to that environment.

    Quick fix could be creating a "app servicing" database attached to each production database and saving those blobs over there.

    Minimal change to the code and immediate effect.

    Then you may think if you want to have a backup plan for those databases and, if yes, what should it look like.

    _____________
    Code for TallyGenerator

  • Yep, the data in the BLOBs is data retained "just in case". It has nothing to do with the day to day application operations.

    I'm definitely leaning towards creating a dedicated database with filestream enabled.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply