February 11, 2011 at 10:08 am
Hello,
I inherited a system that will slowly be rebuilt, but for now would like to find some way of making it easier to deal with 200+ tables, which optimally could be combined into one using EAV or sparse columns, but at the moment there are 500 or so different columns between them. Eventually I'll be going back to the original data where this came from and process it more logically, but for now I need to support this legacy data in my new schema (without having 300 tables around).
So as to not need to define 500 columns in a sparse table arrangement, I was originally thinking of taking each table and storing the data as XML, one record per existing table. Alternatively, it could be one record in each of the existing table per record in the new table, with the existing record data being modeled in XML. I think there would be a significant performance difference, and would certainly change the way the queries would be written, but I don't have any experience working with this XML type and wanted to inquire as to which way would be logical. I'm quite worried about performance, as each of the XML files I've been generating from these tables is 30-40 mb, and this system tends to run out of memory after attempting to parse even just five at a time. In researching this, it looks like most people use XML for different purposes, or for data that is accessed less often. This is read frequently, but not frequently altered.
If it wasn't for the full documents being so large, it would probably be easier to transfer the whole document into a datatable (C#) from the database and then do the querying from there... but to do that with multiple 30mb files at a time would just be ridiculous.
Maybe it would be better/faster go to the sparse table/column route, and just assemble the table programmatically. I'd certainly be running of memory less frequently than at the moment.
Thanks!
February 11, 2011 at 10:29 am
Are all of those tables defining essentially the same thing? Like all are different versions of name-and-address data, or something like that? Or do they define different things, like one is customers and another is orders and a third is pricing data?
If they're all the same kind of thing, which is what I'm assuming from your description, then sparse columns will perform MUCH better than XML storage. I've used both, and they aren't even close to being comparable. XML also takes much more drive space, takes more bandwidth to pass to other servers/clients, etc. It's basically for infrequently accessed semi-structured data.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 11, 2011 at 10:43 am
It's scientific data, so mostly things like p-values, and occasionally a 1-5 character string. Problem is that not all of the values originate from the same statistical workflow, which wasn't documented, so the end result is that data for each experiment was stored in a different table.
Now that I think about it, since it is mostly just small floats and really short strings, thats some ginormous overhead for representing it in XML. Also having worked with it for a couple days now, it seems like a great set of features, but it's best applied in a small subset of applications. I guess I'll look into the sparse option again. It would be much easier to rewrite the existing queries then, without having to get CROSS APPLY in there, etc.
Thanks
February 11, 2011 at 10:57 am
That's pretty much what sparse columns were designed for. Should work out well. Or at least better than EAV or XML.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 4 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply