Hierarchical Storage

  • When I was early in my SQL Server career, I got to work on an image manager piece of software called "Watermark". It was based on SQL Server and had a very interesting model. The software build a directory structure (0-9, A-Z) and nested sub directories below that in a similar fashion. Then it would write an image file into the file system in one of the folders along with an entry in a SQL Server database that contained the meta data (name of the image, location, description, when it was stored, etc.). There was a security system inside SQL Server so when you requested an image from the client, security was checked, the location retrieved from the server and the image sent to the client.

    At the time my company used this software to receive faxes, of which we received hundreds a day relating to imports and exports around the world. We had dedicated people to pull up the faxes on their screen, file them into the software, and then email links (for the software client) to those that needed to see them. It was a slick system and allowed a couple dozen people to manage $50M of small wood sales a year.

    This was in the mid 90's and after a couple years of receiving images, we realized two things. One was that we had so many images that disk space was becoming an issue. The other was that most images were never viewed again after 90 days. So we purchased an add-on software product and an optical jukebox to implement an HSM system. Every night the system would look for files that hadn't been accessed in 90 days and "prune" them off the hard drive and onto an optical disk. If a used needed a file that was on the optical disk, the process for them was the same, it just took longer as the optical disk was located and the file placed back in the file system.

    In 1998 I went to the Professional Developer's Conference and one of the technologies that I saw in a seminar was an HSM system built into Windows. At the time Windows 2000 was getting close to being feature complete and I was interseted since I had an HSM system in place.

    Since that time I've moved on and so has most of the world and HSM has really become a niche. I did a search for

    HSM solutions for Windows and found relatively few products, and perhaps more telling, very few that I've heard of. I know disks have gotten cheap and now many things are stored in email, but I am surprised that HSM systems are not more prevelant.

    I can think of many places, old code bases, old applications, archived data, etc. that could benefit from HSM. Actually, quite a few of us have built HSM type solutions in SQL Server for our old data. SQL Server 2005 paritioning, too late for some of us but here for the rest, makes this easier. However we really don't have an HSM solution. We can access old data, but it doesn't get moved to the "present" partition if it's going to be worked on again. Of course that may be so rare that it's not worth building.

    In any case, I'm interested to know if any of you use HSM. And if it's useful or an outmoded idea.

    Steve Jones

  • It sounds like your FAX system was really a document management system.  When we looked at document management the biggest problem was the cost of the cataloging function.  Everyone was on board until the true costs were exposed.  Storage wasn't even close to being an issue.

     
    Jim
  • What does HSM stand for?



    Michelle

  • You're bringing back awful memories of my IBM mainframe days.   And one funny one. 

    We hired an Australian (originally from Ireland) contractor in the mid '90s.   He was actively working with some files one day.  After leaving and returning to his seat, he exclaimed, "All I did was get some coffee and HSM migrated me bloody datasets !!!". 

    Our storage admin ran HSM as effectively as a bad virus.  We also called her "the DASD Nazi", but that's a different story ...

     

     

  • Actually the software we used was a DM product, we just had it hooked into the fax system.'

    For us it wasn't a cost issue, it was a management issue. Plus if you pay SAN prices for old data or documents that you never use, it really can change the ROI.

    Now if only SQL had a built in, daily/weekly migration component to archive data. Not sure how well it would work with the query processor if you had delays in part of your result set, but it would be slick. I know partitioning does some of this, but it doesn't necessarily handle other media types.

  • Doh, I missed the point.

    So what you're talking about would be a field type - like Archiveable_BLOB?  Or are you thinking of having a new kind of foreign key that knows when the row is in the archive or online?  Or, am I still off topic

  • We do it for a healthcare imaging system (x-ray, mri, pet, cat scans). It's more of a migration from super fast disks to fast disks, to slow disks at present. A dvd juke box migration is coming next. it hasn't arrived because the market and providers are only 6-8 years old. Our system has been in production for almost 4 years. The minimum requirement for immediate online access to data is 10 years ! So the vendors have not addresed the last pahse yet. The present architecture goes like this ... We have a number of Tb on EMC 15k disks, a number of Tb on EMC 10k disks, a number of Tb on EMC 10k ATA disks. Not pretty, not cheap but needed.

    On a more direct paralell we've also got an AS400 (ughh) in the shop that uses TSM too.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • Actually I'd love to have a "computed column" that would determine the partition and move data on a rolling basis to slower disks or even optical/tape.

    It would be good if the query could "queue" up the request while data was reloaded from a slower medium. I realize I'm asking a lot of an OLTP system, but being able to specify some storage on a SAN for realtime queries, some on captive, slower disk or even optical, potentially a minute or two to load, would make archival much easier.

  • I also work in the health industry. We require all PAC Systems (Picture Archiving and Communication) to support HSM. One department runs a screening program, 20% of the images acquired are digital (the rest film). This 20% equates to 3.5 Terabytes of annual growth.

    We manage this by storing first tier on EVA SAN storage. Second tier goes on to a Medical Archive System (MAS). This system is blocks of SATA disks, bought in 5 TB slices, each slice comes with its own Access Node server to ensure the solution is scalable.

    The movement of files between the storage levels is managed by a PACS server that runs on Windows with SQL Server 2000 to manage the indexes of these images.

    I appreciate the HSM is built into the PACS server and not a generic Windows product but maybe this is the approach vendors are taking and why your serach turned up few results. Perhaps vendors of system that manage large amounts of data are expected to provide HSM rather than the vendor walking into an environment that supports HSM.

  • Steve,

    Microsoft's *free* Remote Storage Services looks ideal, if all one needs is a simple single-level HSM. Have you looked at it?

    I'm having a devil of a time getting it to work with a IBM 3583 tape library, though.

    regards,  Chris

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply