Would you use sql_variant?

  • I'm working on a database design for an ODS / Data Warehouse. I have been asked to build a design with three tables, Contract, ContractDetails, and Buckets. The buckets table is basically an ID & the name of a financial bucket. This is for scalibility. The ContractDetails table will be a narrow table with the ContractID, BucketID, BucketAmount in it. Like so:

    Create Table Contract (ContractID int identity(1,1), Name varchar(50), Address varchar(50)...etc.)

    Create Table ContractDetails (ContractDetailID int identity(1,1), ContractID int, BucketID int, BucketValue money)

    Create Table Bucket (BucketID int identity(1,1), BucketName varchar(25))

    Now, the problem is, there are other factors that can affect the price that ends up in "BucketValue". Bits, ints, char values... Things like "is the product new or used" and "is this a reseller or a direct seller" and "when is this price available for this product" kind of questions. So, I've broken them down into ProductStatus (bit) where 1 = New and 0= Used, Reseller (bit) where 1 = reseller and 0 = Direct, StartDate (datetime) for the effective date of a price promotion and TerminationDate for the end date of a price promotion.

    For various reasons, these last few items can't go into the Contract table. But if I stick them in their own ProductDetail table, I'm heading back to "normalized OLTP database" design, which I'm trying to avoid. And if I make "BucketValue" a money datatype, I can't stick those values in that column.

    So, would this be a case where I'd make BucketValue a sql_variant and just deal with the conversion issues later on in the ETL / Reporting process?

    Any thoughts on this would be greatly appreciated. Thanks,

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I have used sql_variant and have not faced any problems as yet. Mind you i did this thing recently. As you said conversion happens only in the Presentation layer of the app.

    "Keep Trying"

  • A CLR UDT might work here. In terms of reading back the information, ToString() could return the value in a consistent format which might make it easier to process. Not sure of the performance implications, but it's probably tough to index that column in any meaningful way when you aren't sure of the type (can you even index variant columns?).

  • ToString() would certainly return the values but you can't SUM() a string. So I would still have to do an additional conversion to get the money or smallmoney values back.

    So is it worth my time to write a CLR UDT?

    I've also thought about an XML String for the "extraneous" values, but then I'd have to shred the string constantly for all the reports that would be based off this DW. I can't seem to come up with one good solution to this problem.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Well you wouldn't be able to sum a variant column either. The CLR type would help in that it could internally figure out the type based on a ruleset, and perhaps return that type as part of the string .... so at least you would know what to convert it to. Obviously if the information is just being passed right out of the database layer then it probably makes no difference.

    XML might help in as much as you can create XML indexes on the column. Do you know what type of value is being inserted at the time? If so, you could store it along these lines:

    <Info ProductStatus="1" Reseller="0" PromotionStartDate="20080101" />

    If you don't know, then it's always going to give a string value and you may as well just use a varchar.

  • Not having used sql_variant before...

    I was under the impression that sql_variant stores different datatypes in the same column. That you can determine which datatype to use for a specific record when you stored the value. So you could have datatypes bit, money, and datetime all in the same column.

    Therefore, if you knew what records to pull and made sure to pull only the records where the value in that column was money, you could use the SUM() function.

    Am I misunderstanding what sql_variant is and/or how it can be used?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • You are correct in that SQL_VARIANT will store most datatypes, but you cannot take a SUM of the values without converting them. I put together a quick script:

    CREATE TABLE TESTVARIANT (INC INT IDENTITY (1, 1) PRIMARY KEY, ITEM SQL_VARIANT)

    INSERT INTO TESTVARIANT (ITEM) VALUES ('3E2')

    INSERT INTO TESTVARIANT (ITEM) VALUES (3E2)

    INSERT INTO TESTVARIANT (ITEM) VALUES (15)

    INSERT INTO TESTVARIANT (ITEM) VALUES ('15')

    INSERT INTO TESTVARIANT (ITEM) VALUES (3.14)

    INSERT INTO TESTVARIANT (ITEM) VALUES ('3.14')

    SELECT * FROM TESTVARIANT

    SELECT SUM(ITEM) FROM TESTVARIANT WHERE INC IN (3, 5)

    SELECT SUM(CONVERT(MONEY, ITEM)) FROM TESTVARIANT WHERE INC IN (3, 5)

    SELECT SUM(CONVERT(MONEY, ITEM)) FROM TESTVARIANT WHERE INC IN (3, 4, 5)

    SELECT SUM(CONVERT(MONEY, ITEM)) FROM TESTVARIANT WHERE INC IN (1, 3, 5)

    DROP TABLE TESTVARIANT

    The first sum statement returns:

    Msg 8117, Level 16, State 1, Line 1

    Operand data type sql_variant is invalid for sum operator.

    The second returns 18.14 (correct)

    The third returns 33.14 (correct)

    The fourth returns:

    Msg 235, Level 16, State 0, Line 1

    Cannot convert a char value to money. The char value has incorrect syntax.

    I included the fourth statement to show that when you insert 3E2 to the table it is stored as 300, but when you insert '3E2' it is stored as a string ... and then won't convert to a money type. This doesn't apply when you insert '15' - not sure why the difference. The type information is stored in the record according to BOL (http://msdn2.microsoft.com/en-us/library/ms173829.aspx)

    To sum up, you can use SUM but only if you explicitly convert to an appropriate type - I assumed from your previous post that you didn't want to do this. Also, SQL_VARIANT is stored with a max length of ~8k, so you can add an index but it will break if you store large values in it. And it looks like full text won't work either. From a storage perspective it seems that SQL_VARIANT is fine, but from a reporting perspective it won't be ideal if you ever want to query on it.

    Matt.

  • Ahhh... Interesting that the following statement will work perfectly:

    INSERT INTO TESTVARIANT (ITEM) (Select Convert(money,(15)))

    And SUM() still will not work with the datatype. I thought it would. In fact, I thought the whole point of Sql_Variant was that you wouldn't have to do conversions when you pulled the data back out of the column.

    Well, lookee there. I learned something new today despite my efforts not to. @=)

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • The problem I am having in trying to answer your question is not knowing the data you will be loading. Having an idea of the source data would greatly help in evaluating what you are trying to do.

    😎

  • I am literally loading three types of values. Money (for financial buckets), bit fields (for determining if someone is a reseller or a direct seller and for determining if a product is new / used), and datetime for the start and end of product promotions.

    The current table details are above in my first post. My dilemia is whether to create a fourth table containing the non-money datatypes, or to stick these non-money datatypes in the Contract table (where they really really don't work / fit) or the ContractDetails table (hence the question about sql_variant).

    Since the bits and datetimes actually have an impact on pricing (rather than the Contract), I want to stick them in the Contract Details, but if I do them individually instead of all values in the same column, here's what I get:

    Create Table ContractDetails (ContractDetailID int identity(1,1), ContractID int, BucketID int, BucketValue money, NewProduct bit, Reseller bit, StartDate datetime, EndDate datetime)

    Which means I have 4 columns added for each individual bucket instead of just one set of for values for every contract. That seems to be an obnoxius waste of space IMHO. But I don't want to create a fourth table in my DW for 4 columns, a PK and an FK. That also seems to be a waste and then I have to do all the joins on it when I'm doing my reports, and everything gets really messy at that point.

    So, Ideally, I stick Reseller, NewProduct, StartDate and EndDate in the Bucket table as individual bucket names (there will be more financial buckets than the ones I've listed below) and then just create records for ContractDetails that look the below:

    BucketID BucketName

    1 RetailPrice

    2 ResellerPrice

    3 Discount

    4 Reseller

    5 NewProduct

    6 StartDate

    7 EndDate

    ContractDetailID ContractID BucketID BucketValue

    1 12345 1 15.00

    2 12345 3 1.00

    3 12345 4 0

    4 12345 5 1

    5 12345 6 11/01/2007

    6 12345 7 12/31/2007

    7 78910 1 30.00

    8 78910 2 12.00

    9 78910 4 1

    10 78910 5 0

    11 78910 6 11/30/2007

    12 78910 7 01/31/2008

    Then I would pull all this data into one line for each Contract ID so it would look pretty on the reports. And SUM() the money buckets for Totals or do AVG() or whatever was required for the appropriate report.

    Does that give you a better idea of what I'm looking at?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I have a better understanding now. My DW knowledge is effectively none so I can't comment on how suitable the table structures are, but in terms of space I think it would be potentially optimal to have seperate columns (assuming that these are nullable columns, and depending on which types they are defined as). The sql_variant type appears (from BOL) to take 16 bytes of overhead per value to store the type, and xml has all the overhead of the structure.

  • Not really much help for me. As i look at what you are asking, I actually see two missing pieces to helping you. The first, is a general understanding of the source data. The second, and just as important, what kind of questions are you trying to answer with the data.

    What you have shown so far doesn't really fit with the dimensional model that you should be using for a data warehouse (or actually in this case what I would call a data mart). You need to identify your dimensions and your facts. To do this, you need to understand the two areas i am asking about.

    😎

  • Lynn,

    I actually do have a good idea of what the dimensions and facts are looking like (and have gotten those questions answered). This is actually one of the dimension tables. And I have been requested to set it up this way by my supervisor. He thinks this will better support performance, though I think the structure he's requesting will work better in an OLTP than it will in a DW. Unfortunately, I have no DW experience to contradict that reasoning, so I have to at least try.

    For security reasons, I've sanitized what I've posted about columns & tables, so you're not seeing what I can see and I can't answer your first question unless you work for the company.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I understand the confidential nature of the data. I was hoping you could come up with some generic data and structures that mimic what you are trying to do so we could get a better understanding of what you are trying to accomplish.

    From what you have shown me, I'm at loss as I feel like I'm in a vacuum trying to give you some advise. With out some kind of understanding of the inputs and outputs, I'm stuck. I can't see how the data will flow into or out of the tables you are trying create.

    Thanks anyways.

    😎

  • Actually, Lynn, I've only given you a small portion of the entire piece, so that might be why you're having problems with it.

    I've decided to go with the four individual columns for each bucket just because of the overhead with sql_variant and XML. I'm sure my boss will question my design during peer review, but IMHO there's just no reason to create a fourth table for 4 columns plus two keys. Data Warehouses are supposed to be like regular DBs, true, but they're supposed to have less joins in them than a normalized DB and I'm going to stick to my guns on this one.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply