Multiple data types for the same column

  • Guys-

    We have a situation where the attribute of an entity can have a float/integer/string.

    I have two ways to implement this. I can have three columns in the same table with the three different data types

    eg:

    TblPoint

    --------

    PointId Int NOT NULL

    PointValueFloat Float

    PointValueInteger Int

    PointValueChar Varchar(20)

    And at any point only one of the 3 columns will have a value and the other two will be left NULL. And I can use a COALSCE to get the value for the Point.

    The other option is to create 3 tables, one for each datatype and insert the values only in the appropriate values.

    eg:

    TblPoint

    --------

    PointId

    PointDataTypeId

    TblPointValueFloat

    ------------------

    PoinId INT

    PointValue FLOAT

    TblPointValueInt

    ----------------

    PoinId INT

    PointValue INT

    TblPointValueString

    -------------------

    PoinId INT

    PointValue VARCHAR(20)

    Depending on the datatype in the point table, I insert records into the appropriate table. And write my queries with a case so that the appropriate values are retrieved for each pointid.

    There will be close to 100 million records in this table.

    What method do you think is better from a performance and disk space stand point given the number of records.

    Your time and thoughts are greatly appreciated.

    Thanks

  • You could also consider SQL2000's sql_variant datatype. Eg.

     
    
    create table MyTable
    (PointValue sql_variant)

    insert MyTable select 1
    insert MyTable select 1.8123e3
    insert MyTable select 'xxx'

    select PointValue ,sql_variant_property(PointValue, 'basetype') from MyTable

    Cheers,

    - Mark


    Cheers,
    - Mark

  • Are they really the same attribute? There may be a logical design issue here. Can you explain what the attribute is?

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • This is an issue I have gone back and forth on a great deal. Splitting everything up into separate tables is such a pain, especially when it's just one field.

    I have been working with sql server 2000 and 2005 for the last year and a half and did not know about the sql_variant data type. I am going to try it out for a survey database where some of the survey question answers are int, decimal, text, varchar, etc.

    Is there any other considerations and/or concerns I should know about sql_variant with regard to performance, use in views, or stored proc's?

  • "a situation where the attribute of an entity can have a float/integer/string"

    Then you have a non-fully defined attribute there.

    Fix the incomplete definition of the attribute and you've fixed your problem.

  • 2.5 cents, 2.5 years late:

    First off, Don Becker would appear to be right--hard to tell without the detailed system requirements.

    Assuming that it's like you said, sounds like type/exclusive subtype. For small sets, moshing them all into one table and dealint with three-way nulls might be adequate, but if there's 100 million rows, I can't help but think that splitting it across multiple tables would be much more efficient (consider storage space, page reads, and the like)--again, depending upon the actual requirements.

    Philip

  • Also,

    Sorry if I appeared to be terse, I totally understand that sometimes things like this aren't under our control.

     

  • Ok, so it sounds like you guys are saying it's better to split it up into separate tables as the datasets grow large for performance reasons.

    So you should set up something like this:

    create table answers (

    rowID int identity (1,1) primary key,

    some_attribute varchar(50)

    )

    create table asnwersInt (

    rowID int primary key,

    answer int not null,

    foreign key (rowID) references answers (rowID)

    )

    create table asnwersFloat (

    rowID int primary key,

    answer float not null,

    foreign key (rowID) references answers (rowID)

    )

    create table asnwersChar (

    rowID int primary key,

    answer char(10) not null,

    foreign key (rowID) references answers (rowID)

    )

    So then would it be possible to make a view that puts the answers from all three answer tables into one single column to make it easy for application developers?

  • Can you explain why there are 3 possible types for this attribute?

    I would hate for you to have to deal with 3 times the amount of tables because of an undefined element.

  • Sure. Basically we do a lot of surveys. Survey's have questions and answers. Some answers are text, some are varchar, some are int, and some are decimal.

    The surveys are done each year, so one question will have many answers (one for each year). As such I have a one to many relationship between questions and answers. I could definitely use the sql_variant datatype but I would prefer a solution that is ODBC complient, and complient with as many connection protocols as possible.

    Typically our reporting tools are ASP.NET web apps and Crystal Reports.

  • Okay, that puts it into perspective.

    In that case you really have 3 types of questions...each question will always have an answer of the same type right?

    This is abit tricky then...as it's not good to have schema based on data. But then again the only way thru it is a variant then, which I would think is worse then.

    I would have 3 seperate tables as you noted above, but I would be very explicit in how/where the decision of what table is updated/inserted is done.

    I would have seperate insert/update procs for each table, do the conditional if's in the DAL layer in the FE code.

    This will allow SQL to cache the execution plans.

    If there is other stuff about the answers that doesn't change (from type to type)(ie, the person answering will always be a VARCHAR(100) or something), make a seperate table. Seems like a pain, but branching into multiple tables, you should minimize what's in there, just an ID back to the question(or answer) and the value.

  • Yeah, currently I have it split up as follows:

    create table answers (

    answerID int identity (1,1) primary key,

    questionID int not null,

    answerYear numeric(4,0) not null,

    userID int not null, /* user who entered the answer */

    foreign key (questionID) references questions (questionID),

    foreign key (userID) references users (userID)

    )

    create table asnwersInt (

    answerID int primary key,

    answer int not null,

    foreign key (answerID) references answers (answerID)

    )

    create table answersDecimal (

    answerID int primary key,

    answer decimal(9,2) not null,

    foreign key (answerID) references answers (answerID)

    )

    create table answersChar (

    answerID int primary key,

    answer varchar(255) not null,

    foreign key (answerID) references answers (answerID)

    )

    create table answersText (

    answerID int primary key,

    answer text not null,

    foreign key (answerID) references answers (answerID)

    )

    Now, I am now trying to make a view that pulls all of the questions and answers together into an easy to use format for the app developers for reporting.

    I want a view with the following columns:

    QuestionText Varchar(255)

    AnswerYear numeric(4,0)

    Answer ???

    I was successful at using left joins to get a view that has each different type of answer in it's own column, so the view has the following columns:

    QuestionText

    AnswerYear

    AnswerInt

    AnswerText

    AnswerDecimal

    AnswerChar

    But I thought there might be a way to consolidate those four answer columns into one, and maybe make them all text somehow. Whenever I attempt that the SQL won't compile. I am using sql server 2005 by the way, although I'm sure that makes no difference in this case.

  • My response assumes you are not entering values for one answer into multiple tables.

  • select QuestionText

    , AnswerYear

    , AnswerInt

    Convert(VARCHAR(100), AnswerText) as 'AnswerValue'

    from tblQuestion q

    inner join tblAnswer a

    on q.questionid = a.questionid

    inner join tblAnswerText at

    on a.answerid = at.answerid

    UNION

    select QuestionText

    , AnswerYear

    , AnswerInt

    Convert(VARCHAR(100), AnswerDecimal) as 'AnswerValue'

    from tblQuestion q

    inner join tblAnswer a

    on q.questionid = a.questionid

    inner join tblAnswerDecimal ad

    on a.answerid = ad.answerid

    etc. etc.

    I belive in a UNION you have to not only have the same number of columns, name and order, but also type.

    Let me know if that doesnt' work.

  • And yes, to answer your question, any given question would always have the same answer datatype over the years. If we are asking a company how many widgets they sold each year, we would always want Int answers to that question over the years so that the report for multi year data makes sense and is comparable.

    If we changed say, from Int to Float on a question, then we'd be breaking the context of the past years data. The solution would be to start a new question, and never to modify the datatype of a current questions answer.

    By the way I appreciate any advice you have, thanks!

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply