Why is data integrity important?

  • mtassin (8/31/2010)


    Eric Russell 13013 (8/30/2010)


    Data integrity and constraint begins with choosing the appropriate data type. Obviously (to most of us) integers should be contained in an Integer data type and monetary values in a decimal with 2 decimal places. However, what I see happen all the time, even in major ISV applications, are date/time values contained in a varchar instead of a proper date data type. Not only does the reporting process take a performance hit with the data conversion, but inevitably there will be the occasional Feb 30. Even worse is when different users or applications use a different coding scheme and you have to sort out (in a where clause at runtime) the context of '3/11/09 6:22'.

    Amen,

    I've seen so many of these in older databases where I am... we work to squish them... but in many cases the business won't prioritize developer time to do so.

    Even better is when you find tables that violate 1NF. And have something like a column called groups with values like 'ADMIN;USER;MANAGER' in it.

    Come on... :'(

    is that where people try to use SQL instead of LDAP?

  • alen teplitsky (8/31/2010)


    is that where people try to use SQL instead of LDAP?

    Sort of... it's for forms based authentication on a web page. Instead of creating the M:N association table between users and groups... they stick the groups into the users table... We cry when we see this one... then cry more when we realise how much legacy code is tied to it with such awesome joins as

    WHERE

    '%' + group.code + '%' LIKE users.groups



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Eric Russell 13013 (8/30/2010)


    alen teplitsky (8/30/2010)


    i've seen this no NULL's allowed OCD thing being taken to extremes

    say you sign up a customer today and promise some service that will be done in the near future but you're not sure of the date. since the column is not null you have to put in a crazy date like 1/1/2049. or if something was done in the past then 1/1/1901

    so if a customer calls in a few days the CSR will happily tell them they can expect to see someone come to their site in 2049

    Yep. You can also end up in jail after asking the officer "Is this a joke?" after he informs you your driver's license was suspended almost 50 years before you were born. The suspension date was entered as 11/11/11 and the year was circa 1985, when I was in my 20's. The officer was so stupid he didn't know what he had just said.

  • mtassin (8/31/2010)


    RalphWilson (8/30/2010)


    A "skipped" question is one that has not been answered but has a Question Number that is lower than the highest Question number that has been answered.

    Why not just store the results as the questions are asked?

    Table_Answers(

    Answerid int identity(1,1) PRIMARY KEY,

    Testid int NOT NULL,

    questionid int NOT NULL,

    TestAnswer int,

    TestScore int NOT NULL)

    As the questions are asked and answered/skipped just store either

    1 in testscore if it's correct

    -3 in testscore if it's wrong

    -1 in testscore if it's skipped

    Then left join Table_Answers to Table_questions if you care about the unanswered questions.

    If Testanswer can be more than multiple choices throw in a varchar(max) or nvarchar(max) column for the free form answers... but those will be harder to grade via SQL anyway.

    To put it mildly, that really screws up trying to do any overall input and analysis of the test results because the input process is a specialized reader for those Mark The Bubble tests. There is a key premarked on the sheet that mtaches the sheet tot he particular set/arrangement of questions that were answered. So, it becomes a bit more awkward to try to snag which questions were skipped on the fly . . . the reader application simply reads the sheets and records the answers as a row in a table for the most part. The essay questions are handled slightly differently but most of the tests (and the majority of all tests) are either T/F or multiple choice (usually 5 options . . . with the odd 4 option question often getting option 5 selected ;-).

    Ralph D. Wilson II
    Development DBA

    "Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
    A. Lincoln

  • RalphWilson (8/30/2010)


    Now that we appear to be including DW in the mix, I have a question for y'all.

    Suppose I have a database in which I am storing the answers that are given during the various exams that a students might take for a course. The design of the exam is such that it is anticipated that no student will complete the exam (thus, there are NULLs expected for some number of answers to questions for each student for each exam). The scoring on the exam is such that 1 point is added for correct answers, 3 points are subtracted for wrong answers and any _un_answered_ (ie. skipped) questions result in a 1 point subtraction. (Thus, the test is designed to try to suppress pure guesses but to also encourage students to answer all questions before skipping them.)

    How do you differentuiate between the NULL of a skipped question (i.e. "No Answer Given") vs the NULL of the question that has not yet been reached (i.e. the "Not Applicable")?

    This has occassionally come up in my experience and I have developed sevral diffent qork-arounds. I am just seeking advise here. 😉

    So, are the students instructed to start at question 1 and proceed to the last question? Because in a "normal" test, the order in which the questions are asked does not matter, so it's perfectly acceptable to start at the end and work forward.

  • RalphWilson (8/31/2010)


    mtassin (8/31/2010)


    RalphWilson (8/30/2010)


    To put it mildly, that really screws up trying to do any overall input and analysis of the test results because the input process is a specialized reader for those Mark The Bubble tests. There is a key premarked on the sheet that mtaches the sheet tot he particular set/arrangement of questions that were answered. So, it becomes a bit more awkward to try to snag which questions were skipped on the fly . . . the reader application simply reads the sheets and records the answers as a row in a table for the most part. The essay questions are handled slightly differently but most of the tests (and the majority of all tests) are either T/F or multiple choice (usually 5 options . . . with the odd 4 option question often getting option 5 selected ;-).

    Didn't realize these were bubble questions... figured it had more to do with the last classes I taught which had a specialized testing application that worked this way. You got question one, either answered or skipped it, then got question two. All being stored back end as you answered it.

    Gosh... I didn't know anybody still used Scantron. 🙂

    What does the scanner return on a non-answered question?



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • david_wendelken (8/31/2010)

    Yep. You can also end up in jail after asking the officer "Is this a joke?" after he informs you your driver's license was suspended almost 50 years before you were born. The suspension date was entered as 11/11/11 and the year was circa 1985, when I was in my 20's. The officer was so stupid he didn't know what he had just said.

    I've seen cases where an application required the user to supply a date, and over time they adopted this "work around" process of stubbing something like 11/11/11 when they didn't know what to enter. Of course this was undocumented, and they often forgot to go back re-enter a real date. That's one strong and compelling reason for allowing some columns to be NULL-able. It's fine so long as there is also a column indicating record completion status to prevent half baked data from getting sent to the shop floor or ending up on a report.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • mtassin (8/31/2010)


    RalphWilson (8/31/2010)


    mtassin (8/31/2010)


    RalphWilson (8/30/2010)


    To put it mildly, that really screws up trying to do any overall input and analysis of the test results because the input process is a specialized reader for those Mark The Bubble tests. There is a key premarked on the sheet that mtaches the sheet tot he particular set/arrangement of questions that were answered. So, it becomes a bit more awkward to try to snag which questions were skipped on the fly . . . the reader application simply reads the sheets and records the answers as a row in a table for the most part. The essay questions are handled slightly differently but most of the tests (and the majority of all tests) are either T/F or multiple choice (usually 5 options . . . with the odd 4 option question often getting option 5 selected ;-).

    Didn't realize these were bubble questions... figured it had more to do with the last classes I taught which had a specialized testing application that worked this way. You got question one, either answered or skipped it, then got question two. All being stored back end as you answered it.

    Gosh... I didn't know anybody still used Scantron. 🙂

    What does the scanner return on a non-answered question?

    The scanner returns a space character. (The other options are returned as the characters '1' through '5'. T/F are generally coded as '1' and '2'.) These are not the richest of testing facilities. 😉

    Ralph D. Wilson II
    Development DBA

    "Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
    A. Lincoln

  • I just spent 2 weeks(!!) cleaning up a lookup table in a third-party ERP application that allowed (and still allows, but for the now-rescinded, application level user permissions) free-form entry for states and countries.

    Fun fact: the United States of America has 786 states within its borders, including China, Formosa, Bhopal, and Chicago.

    Referential Integrity??? "We don't need no stinkin' referential integrity!!!":smooooth:

    Rich

  • rmechaber (8/31/2010)


    I just spent 2 weeks(!!) cleaning up a lookup table in a third-party ERP application that allowed (and still allows, but for the now-rescinded, application level user permissions) free-form entry for states and countries.

    Fun fact: the United States of America has 786 states within its borders, including China, Formosa, Bhopal, and Chicago.

    Referential Integrity??? "We don't need no stinkin' referential integrity!!!":smooooth:

    Rich

    Some time back, I successfully made the case for an organization to have a look up table for countries & states (if the country was the US) based upon having located rows indicating Oklahoma City was in something like 30 of the 50 states and at least 15 foreign countries. (I have strong reason to believe that, at the very least, "Oklahoma City, TX" was incorrect. 😉 😎 )

    Ralph D. Wilson II
    Development DBA

    "Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
    A. Lincoln

  • RalphWilson (9/1/2010)


    Some time back, I successfully made the case for an organization to have a look up table for countries & states (if the country was the US) based upon having located rows indicating Oklahoma City was in something like 30 of the 50 states and at least 15 foreign countries. (I have strong reason to believe that, at the very least, "Oklahoma City, TX" was incorrect. 😉 😎 )

    I still remember when I discovered that Cook was a Country and that Chicago was a state.

    Though my favorite country is still

    Mr. Thomas Smith



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • rmechaber (8/31/2010)


    I just spent 2 weeks(!!) cleaning up a lookup table in a third-party ERP application that allowed (and still allows, but for the now-rescinded, application level user permissions) free-form entry for states and countries.

    Fun fact: the United States of America has 786 states within its borders, including China, Formosa, Bhopal, and Chicago.

    Referential Integrity??? "We don't need no stinkin' referential integrity!!!":smooooth:

    Rich

    Oh I feel your pain. We have a free-form "Occupation" field on one of our main Sales applications, so I've just spent weeks building a lookup table to make some sense of the thousands of mis-spelled and misleading entries.

    My particular favourites are "HR Minger" - I can only assume they meant "HR Manager", and the eighteen different spellings of the word "Childminder".

  • miki_home (9/2/2010)


    Oh I feel your pain. We have a free-form "Occupation" field on one of our main Sales applications, so I've just spent weeks building a lookup table to make some sense of the thousands of mis-spelled and misleading entries.

    My particular favourites are "HR Minger" - I can only assume they meant "HR Manager", and the eighteen different spellings of the word "Childminder".

    For some classifications, like occupation or geographic region for example, there are standard coding schemes. After mapping and re-coding user entered free form descriptions to the standard coding, then there are more possibilities for doing stuff like cross referencing sales data with 3rd party marketing databases.

    http://www.bls.gov/soc/

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Hi All,

    Is there any 3rd party tool or something to check the integrity of the existing data in SQL Server.

    I am using SQL Server 2008. Something similar to DBSanity :http://databene.org/dbsanity

    Please suggest.

Viewing 14 posts - 31 through 43 (of 43 total)

You must be logged in to reply to this topic. Login to reply