﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by Jamin VanderBerg  / Why is data integrity important? / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Fri, 24 May 2013 02:12:44 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>Hi All,Is there any 3rd party tool or something to check the integrity of the existing data in SQL Server.I am using SQL Server 2008. Something similar to  DBSanity :http://databene.org/dbsanityPlease suggest.</description><pubDate>Wed, 03 Oct 2012 05:12:10 GMT</pubDate><dc:creator>sangs8788</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]miki_home (9/2/2010)[/b][hr]Oh I feel your pain. We have a free-form "Occupation" field on one of our main Sales applications, so I've just spent weeks building a lookup table to make some sense of the thousands of mis-spelled and misleading entries.My particular favourites are "HR Minger" - I can only assume they meant "HR Manager", and the eighteen different spellings of the word "Childminder".[/quote]For some classifications, like occupation or geographic region for example, there are standard coding schemes. After mapping and re-coding user entered free form descriptions to the standard coding, then there are more possibilities for doing stuff like cross referencing sales data with 3rd party marketing databases. http://www.bls.gov/soc/</description><pubDate>Thu, 02 Sep 2010 09:41:35 GMT</pubDate><dc:creator>Eric M Russell</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]rmechaber (8/31/2010)[/b][hr]I just spent 2 [b]weeks[/b](!!) cleaning up a lookup table in a third-party ERP application that allowed (and still allows, but for the now-rescinded, application level user permissions) free-form entry for states and countries.Fun fact: the United States of America has 786 states within its borders, including China, Formosa, Bhopal, and Chicago.Referential Integrity???  "We don't need no stinkin' referential integrity!!!":smooooth:Rich[/quote]Oh I feel your pain. We have a free-form "Occupation" field on one of our main Sales applications, so I've just spent weeks building a lookup table to make some sense of the thousands of mis-spelled and misleading entries.My particular favourites are "HR Minger" - I can only assume they meant "HR Manager", and the eighteen different spellings of the word "Childminder".</description><pubDate>Thu, 02 Sep 2010 09:07:14 GMT</pubDate><dc:creator>miki_home</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]RalphWilson (9/1/2010)[/b][hr]Some time back, I successfully made the case for an organization to have a look up table for countries &amp; states (if the country was the US) based upon having located rows indicating Oklahoma City was in something like 30 of the 50 states and at least 15 foreign countries.  (I have strong reason to believe that, at the very least, "Oklahoma City, [b]TX[/b]" was incorrect. ;-) :cool: )[/quote]I still remember when I discovered that Cook was a Country and that Chicago was a state.Though my favorite country is stillMr. Thomas Smith</description><pubDate>Wed, 01 Sep 2010 08:36:48 GMT</pubDate><dc:creator>mtassin</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]rmechaber (8/31/2010)[/b][hr]I just spent 2 [b]weeks[/b](!!) cleaning up a lookup table in a third-party ERP application that allowed (and still allows, but for the now-rescinded, application level user permissions) free-form entry for states and countries.Fun fact: the United States of America has 786 states within its borders, including China, Formosa, Bhopal, and Chicago.Referential Integrity???  "We don't need no stinkin' referential integrity!!!":smooooth:Rich[/quote]Some time back, I successfully made the case for an organization to have a look up table for countries &amp; states (if the country was the US) based upon having located rows indicating Oklahoma City was in something like 30 of the 50 states and at least 15 foreign countries.  (I have strong reason to believe that, at the very least, "Oklahoma City, [b]TX[/b]" was incorrect. ;-) :cool: )</description><pubDate>Wed, 01 Sep 2010 08:10:59 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>I just spent 2 [b]weeks[/b](!!) cleaning up a lookup table in a third-party ERP application that allowed (and still allows, but for the now-rescinded, application level user permissions) free-form entry for states and countries.Fun fact: the United States of America has 786 states within its borders, including China, Formosa, Bhopal, and Chicago.Referential Integrity???  "We don't need no stinkin' referential integrity!!!":smooooth:Rich</description><pubDate>Tue, 31 Aug 2010 18:16:52 GMT</pubDate><dc:creator>rmechaber</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]mtassin (8/31/2010)[/b][hr][quote][b]RalphWilson (8/31/2010)[/b][hr][quote][b]mtassin (8/31/2010)[/b][hr][quote][b]RalphWilson (8/30/2010)[/b][hr]To put it mildly, that really screws up trying to do any overall input and analysis of the test results because the input process is a specialized reader for those Mark The Bubble tests.  There is a key premarked on the sheet that mtaches the sheet tot he particular set/arrangement of questions that were answered.  So, it becomes a bit more awkward to try to snag which questions were skipped on the fly . . . the reader application simply reads the sheets and records the answers as a row in a table for the most part.  The essay questions are handled slightly differently but most of the tests (and the majority of [u]all[/u] tests) are either T/F or multiple choice (usually 5 options . . . with the odd 4 option question often getting option 5 selected ;-).[/quote]Didn't realize these were bubble questions... figured it had more to do with the last classes I taught which had a specialized testing application that worked this way.  You got question one, either answered or skipped it, then got question two.  All being stored back end as you answered it.Gosh... I didn't know anybody still used Scantron.  :)What does the scanner return on a non-answered question?[/quote]The scanner returns a space character.  (The other options are returned as the characters '1' through '5'.  T/F are generally coded as '1' and '2'.)  These are not the richest of testing facilities. ;-)</description><pubDate>Tue, 31 Aug 2010 11:02:19 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]david_wendelken (8/31/2010)[/b]Yep.  You can also end up in jail after asking the officer "Is this a joke?" after he informs you your driver's license was suspended almost 50 years before you were born.  The suspension date was entered as 11/11/11 and the year was circa 1985, when I was in my 20's.  The officer was so stupid he didn't know what he had just said.[/quote]I've seen cases where an application required the user to supply a date, and over time they adopted this "work around" process of stubbing something like 11/11/11 when they didn't know what to enter. Of course this was undocumented, and they often forgot to go back re-enter a real date. That's one strong and compelling reason for allowing some columns to be NULL-able. It's fine so long as there is also a column indicating record completion status to prevent half baked data from getting sent to the shop floor or ending up on a report.</description><pubDate>Tue, 31 Aug 2010 10:51:23 GMT</pubDate><dc:creator>Eric M Russell</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]RalphWilson (8/31/2010)[/b][hr][quote][b]mtassin (8/31/2010)[/b][hr][quote][b]RalphWilson (8/30/2010)[/b][hr]To put it mildly, that really screws up trying to do any overall input and analysis of the test results because the input process is a specialized reader for those Mark The Bubble tests.  There is a key premarked on the sheet that mtaches the sheet tot he particular set/arrangement of questions that were answered.  So, it becomes a bit more awkward to try to snag which questions were skipped on the fly . . . the reader application simply reads the sheets and records the answers as a row in a table for the most part.  The essay questions are handled slightly differently but most of the tests (and the majority of [u]all[/u] tests) are either T/F or multiple choice (usually 5 options . . . with the odd 4 option question often getting option 5 selected ;-).[/quote]Didn't realize these were bubble questions... figured it had more to do with the last classes I taught which had a specialized testing application that worked this way.  You got question one, either answered or skipped it, then got question two.  All being stored back end as you answered it.Gosh... I didn't know anybody still used Scantron.  :)What does the scanner return on a non-answered question?</description><pubDate>Tue, 31 Aug 2010 10:45:24 GMT</pubDate><dc:creator>mtassin</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]RalphWilson (8/30/2010)[/b][hr]Now that we appear to be including DW in the mix, I have a question for y'all.Suppose I have a database in which I am storing the answers that are given during the various exams that a students might take for a course.  The design of the exam is such that it is anticipated that no student will complete the exam (thus, there are NULLs expected for some number of answers to questions for each student for each exam).  The scoring on the exam is such that 1 point is added for correct answers, 3 points are subtracted for wrong answers and any _un_answered_ (ie. skipped) questions result in a 1 point subtraction.  (Thus, the test is designed to try to suppress pure guesses but to also encourage students to answer all questions before skipping them.)How do you differentuiate between the NULL of a skipped question (i.e. "No Answer Given") vs the NULL of the question that has not yet been reached (i.e. the "Not Applicable")?This has occassionally come up in my experience and I have developed sevral diffent qork-arounds.  I am just seeking advise here. ;-)[/quote]So, are the students instructed to start at question 1 and proceed to the last question?  Because in a "normal" test, the order in which the questions are asked does not matter, so it's perfectly acceptable to start at the end and work forward.</description><pubDate>Tue, 31 Aug 2010 10:11:53 GMT</pubDate><dc:creator>david_wendelken</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]mtassin (8/31/2010)[/b][hr][quote][b]RalphWilson (8/30/2010)[/b][hr]A "skipped" question is one that has not been answered but has a Question Number that is lower than the highest Question number that [u]has[/u] been answered. [/quote]Why not just store the results as the questions are asked?Table_Answers(Answerid int identity(1,1) PRIMARY KEY,Testid int NOT NULL,questionid int NOT NULL,TestAnswer int,TestScore int NOT NULL)As the questions are asked and answered/skipped just store either 1 in testscore  if it's correct-3 in testscore if it's wrong-1 in testscore if it's skippedThen left join Table_Answers to Table_questions if you care about the unanswered questions.If Testanswer can be more than multiple choices throw in a varchar(max) or nvarchar(max) column for the free form answers... but those will be harder to grade via SQL anyway.[/quote]To put it mildly, that really screws up trying to do any overall input and analysis of the test results because the input process is a specialized reader for those Mark The Bubble tests.  There is a key premarked on the sheet that mtaches the sheet tot he particular set/arrangement of questions that were answered.  So, it becomes a bit more awkward to try to snag which questions were skipped on the fly . . . the reader application simply reads the sheets and records the answers as a row in a table for the most part.  The essay questions are handled slightly differently but most of the tests (and the majority of [u]all[/u] tests) are either T/F or multiple choice (usually 5 options . . . with the odd 4 option question often getting option 5 selected ;-).</description><pubDate>Tue, 31 Aug 2010 09:36:37 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]Eric Russell 13013 (8/30/2010)[/b][hr][quote][b]alen teplitsky (8/30/2010)[/b][hr]i've seen this no NULL's allowed OCD thing being taken to extremessay you sign up a customer today and promise some service that will be done in the near future but you're not sure of the date. since the column is not null you have to put in a crazy date like 1/1/2049. or if something was done in the past then 1/1/1901so if a customer calls in a few days the CSR will happily tell them they can expect to see someone come to their site in 2049[/quote][/quote]Yep.  You can also end up in jail after asking the officer "Is this a joke?" after he informs you your driver's license was suspended almost 50 years before you were born.  The suspension date was entered as 11/11/11 and the year was circa 1985, when I was in my 20's.  The officer was so stupid he didn't know what he had just said.</description><pubDate>Tue, 31 Aug 2010 09:15:06 GMT</pubDate><dc:creator>david_wendelken</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]alen teplitsky (8/31/2010)[/b][hr]is that where people try to use SQL instead of LDAP?[/quote]Sort of... it's for forms based authentication on a web page.  Instead of creating the M:N association table between users and groups... they stick the groups into the users table... We cry when we see this one... then cry more when we realise how much legacy code is tied to it with such awesome joins asWHERE '%' + group.code + '%' LIKE users.groups</description><pubDate>Tue, 31 Aug 2010 08:00:31 GMT</pubDate><dc:creator>mtassin</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]mtassin (8/31/2010)[/b][hr][quote][b]Eric Russell 13013 (8/30/2010)[/b][hr]Data integrity and constraint begins with choosing the appropriate data type. Obviously (to most of us) integers should be contained in an Integer data type and monetary values in a decimal with 2 decimal places. However, what I see happen all the time, even in major ISV applications, are date/time values contained in a varchar instead of a proper date data type. Not only does the reporting process take a performance hit with the data conversion, but inevitably there will be the occasional Feb 30. Even worse is when different users or applications use a different coding scheme and you have to sort out (in a where clause at runtime) the context of '3/11/09 6:22'.[/quote]Amen,I've seen so many of these in older databases where I am... we work to squish them... but in many cases the business won't prioritize developer time to do so.Even better is when you find tables that violate 1NF.  And have something like a column called groups with values like 'ADMIN;USER;MANAGER' in it.Come on... :'([/quote]is that where people try to use SQL instead of LDAP?</description><pubDate>Tue, 31 Aug 2010 07:51:16 GMT</pubDate><dc:creator>alen teplitsky</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]RalphWilson (8/30/2010)[/b][hr]A "skipped" question is one that has not been answered but has a Question Number that is lower than the highest Question number that [u]has[/u] been answered. [/quote]Why not just store the results as the questions are asked?Table_Answers(Answerid int identity(1,1) PRIMARY KEY,Testid int NOT NULL,questionid int NOT NULL,TestAnswer int,TestScore int NOT NULL)As the questions are asked and answered/skipped just store either 1 in testscore  if it's correct-3 in testscore if it's wrong-1 in testscore if it's skippedThen left join Table_Answers to Table_questions if you care about the unanswered questions.If Testanswer can be more than multiple choices throw in a varchar(max) or nvarchar(max) column for the free form answers... but those will be harder to grade via SQL anyway.</description><pubDate>Tue, 31 Aug 2010 07:42:35 GMT</pubDate><dc:creator>mtassin</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]Eric Russell 13013 (8/30/2010)[/b][hr]Data integrity and constraint begins with choosing the appropriate data type. Obviously (to most of us) integers should be contained in an Integer data type and monetary values in a decimal with 2 decimal places. However, what I see happen all the time, even in major ISV applications, are date/time values contained in a varchar instead of a proper date data type. Not only does the reporting process take a performance hit with the data conversion, but inevitably there will be the occasional Feb 30. Even worse is when different users or applications use a different coding scheme and you have to sort out (in a where clause at runtime) the context of '3/11/09 6:22'.[/quote]Amen,I've seen so many of these in older databases where I am... we work to squish them... but in many cases the business won't prioritize developer time to do so.Even better is when you find tables that violate 1NF.  And have something like a column called groups with values like 'ADMIN;USER;MANAGER' in it.Come on... :'(</description><pubDate>Tue, 31 Aug 2010 07:33:30 GMT</pubDate><dc:creator>mtassin</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote]One argument that can hardly be touched : no matter how you deal with it in the db, you'll have to do work in the application to handle cases... or waste time training people so that 2049 is not the date we are planning to visit to do the work.[/quote]And, if you choose a date in the past, you have to preclude it coming up as a "Late Service Call while, if you choose one in the future you risk it becoming a "reasonable" date (e.g. if you choose '2049-01-01' and it's some time in December of 2048, that date might look reasonable).It just seems like nothing is as easy as it should be, right? ;-)  After all, in about 1976, some dang fool professor was chastising me and about 243 other students in an SMU Programming Languages course because we were wasting our time because "In another 10 or 15 years, there will be no need for programmers." ;-)In fact, as I remember it, he claimed that computers would be able to interpet verbalized needs in order to create their own tables, indexes, etc. :-D</description><pubDate>Mon, 30 Aug 2010 14:36:02 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>I don't remember who wrote this comment, but it's the most profound one I've ever come across regarding data quality.  I'll paraphrase because I don't remember the exact quote:"The best quality information in a company is who owes them money.  Everything else goes downhill very rapidly."</description><pubDate>Mon, 30 Aug 2010 14:27:19 GMT</pubDate><dc:creator>david_wendelken</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]Steve Jones - Editor (8/30/2010)[/b][hr]Not sure I understand? Wouldn't a skipped question have a -1?For the answer, is a blank because it was skipped different than a blank because they didn't know the answer?[/quote]What if it was skipped because they did not know the answer?As Shakespeare once said: "That which we call a rose by any other name would smell as sweet."</description><pubDate>Mon, 30 Aug 2010 13:14:09 GMT</pubDate><dc:creator>Alvin Ramard</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>A "skipped" question is one that has not been answered but has a Question Number that is lower than the highest Question number that [u]has[/u] been answered. An "Unanswered Question" is a question with a higher Question Number than the highest Question Number that [u]has[/u] an answwer.Of course, you may have already thought of the question that I came up with when presented with this problem, "How do I know whether Question N+1, where N is the highest Question Number with an answer,  is an [u]un[/u]answered vs a [u]skipped[/u] question?"The Skipped Questions will have a NULL because there is no answer marked (and, therefore, no value to record).  They will not have a value of -1 because a) they have no answer and b) there is nothing to preclude their being True/False, Multiple Choice (with alpha choices) or even essay questions.</description><pubDate>Mon, 30 Aug 2010 13:09:29 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>I don't understand either.  How do you differentiate between a skipped an unanswered?  Is it all questions above the last answered?</description><pubDate>Mon, 30 Aug 2010 13:00:10 GMT</pubDate><dc:creator>RonKyle</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>Not sure I understand? Wouldn't a skipped question have a -1?For the answer, is a blank because it was skipped different than a blank because they didn't know the answer?</description><pubDate>Mon, 30 Aug 2010 12:58:49 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]alen teplitsky (8/30/2010)[/b][hr]i've seen this no NULL's allowed OCD thing being taken to extremessay you sign up a customer today and promise some service that will be done in the near future but you're not sure of the date. since the column is not null you have to put in a crazy date like 1/1/2049. or if something was done in the past then 1/1/1901so if a customer calls in a few days the CSR will happily tell them they can expect to see someone come to their site in 2049[/quote]In a scenario like this, when there can be multiple reasons for a date to be NULL, I typically allow the column to be NULL, but I also have a column indicating the status of the record. For example, a sales order shipment can have a status of 'Pending' or 'Cancelled', which would explain the non-existing schedule date. I will also have a check constraint, enforcing not-NULL conditionally for specific status values. That way the application or report builders don't have to make assumptions based on the absence of information.</description><pubDate>Mon, 30 Aug 2010 12:58:42 GMT</pubDate><dc:creator>Eric M Russell</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>Now that we appear to be including DW in the mix, I have a question for y'all.Suppose I have a database in which I am storing the answers that are given during the various exams that a students might take for a course.  The design of the exam is such that it is anticipated that no student will complete the exam (thus, there are NULLs expected for some number of answers to questions for each student for each exam).  The scoring on the exam is such that 1 point is added for correct answers, 3 points are subtracted for wrong answers and any _un_answered_ (ie. skipped) questions result in a 1 point subtraction.  (Thus, the test is designed to try to suppress pure guesses but to also encourage students to answer all questions before skipping them.)How do you differentuiate between the NULL of a skipped question (i.e. "No Answer Given") vs the NULL of the question that has not yet been reached (i.e. the "Not Applicable")?This has occassionally come up in my experience and I have developed sevral diffent qork-arounds.  I am just seeking advise here. ;-)</description><pubDate>Mon, 30 Aug 2010 12:56:00 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]RonKyle (8/30/2010)[/b][hr]Just because there are multiple possibilities does not invalidate the use of Nulls.  [/quote]I would agree, but it does make an argument that NULLS can be problematic. I think NULLs can work well in an application.Great point on the OLAP, ignore rows in a calculation, comment.</description><pubDate>Mon, 30 Aug 2010 12:52:23 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>I actually do the same thing with my data warehouse, having seperate keys for N/A and Unknown.  If there really wasn't a part sold on an order line, then the part is N/A.  If I have a customer without a customer type, it is Unknown.  The difference is that there [i]should[/i] be a customer type, but isn't.  There doesn't have to be a part.However, since CustomerType is an attribute and not a dimension itself, and the source system allows NULLs, I originally had the field as a nullable column.  This caused a lot of problems with groupings and filters, most of which were due to the old NULL &amp;lt;&amp;gt; NULL rule.  It was then that all the NULLs were replaced with Unknown.</description><pubDate>Mon, 30 Aug 2010 12:41:03 GMT</pubDate><dc:creator>jvanderberg</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote]So, Steve, what is the definition of "N/A"? Is it "Not Applicable" or "Not Available" or "No Answer"?[/quote]Just because there are multiple possibilities does not invalidate the use of Nulls.  Sometimes all of the answers are possible, hence the reason some people advocate the creation of multiple types of NULLs.  If N/A could be put in an integer or a date field, then someone could make a better case against nulls.  Until that time, however...In OLAP designs, there are no NULL FKs because OLTP Nulls can be accounted for with a dimension row.  These have two possible rows, however, one for N/A (Not Applicable) and one for Unknown, because there should be an answer.  An incomplete order will not have a completion date, and will appear in the N/A column for Completed Date.  A completed order that doesn't have a complete date, however, will be in the Unknown column, because it [i]should [/i]have a complete date. A missing Create Date will always be Unkown, because a created item should have one of these. This is possible in an OLAP design the date field can (and should) be masked with a text field, however, so it's possible use 12/31/69 for N/A and 1/1/70 for unknown, because the user will never see these.  This would be awkward in an OLTP design.While we are on the subject of OLAP designs, there are average calculations that would not work w/o the use of Nulls because that's the best way to omit from a calculation rows that have nothing to do with that particular calculation.</description><pubDate>Mon, 30 Aug 2010 12:35:27 GMT</pubDate><dc:creator>RonKyle</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]RalphWilson (8/30/2010)[/b][hr][quote][b]Steve Jones - Editor (8/30/2010)[/b][hr]I've always seen "N/A" as "Not applicable", but you give some good other responses that might be used. Hence another point for the "non-nulls" argument.[/quote]I have dealt with DBA's who insisted on all columns being non-NULL and I have engaged in "disucssions" about such (both in person and in cyberspace) and I have yet to find a really good argument for changing my position: NULLs should be used when appropriate and avoided when not apporpriate.If there is a referential integrity rule that, in effect, says that every row in TableA has to have a FK/PK link to TableB, then it is inappropriate for that column in TableA to be NULL.  However, if there is a column, for instance, in a Hospital Information System's PatientDetails table, that is supposed to contain a Date of Birth and it is possible that a) that information may notbe readily available (e.g. John Doe is unconscious when admitted) and b) it is not a show stopper if it either is incorrect or unavailable, I would rather have that column in that row left NULL than to assign an arbitrary DOB.  If the column is not essential to the overall integrity of that row, when I am faced with the choice of using an arbitrary default value or leaving the column NULL, I choose the NULL.[/quote]One argument that can hardly be touched : no matter how you deal with it in the db, you'll have to do work in the application to handle cases... or waste time training people so that 2049 is not the date we are planning to visit to do the work.If you decide to somehow ban all nulls in te db by adding tables, then you'll get all those nulls back as soon as you query anything because of all the left joins you'll have to perform.</description><pubDate>Mon, 30 Aug 2010 12:28:13 GMT</pubDate><dc:creator>Ninja's_RGR'us</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]alen teplitsky (8/30/2010)[/b][hr]i've seen this no NULL's allowed OCD thing being taken to extremessay you sign up a customer today and promise some service that will be done in the near future but you're not sure of the date. since the column is not null you have to put in a crazy date like 1/1/2049. or if something was done in the past then 1/1/1901so if a customer calls in a few days the CSR will happily tell them they can expect to see someone come to their site in 2049[/quote]Hehe, I've worked for them :w00t:.</description><pubDate>Mon, 30 Aug 2010 12:24:55 GMT</pubDate><dc:creator>Ninja's_RGR'us</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>It's as they say: no matter how good a thing is, there is always such a thing as too much of it.</description><pubDate>Mon, 30 Aug 2010 11:34:18 GMT</pubDate><dc:creator>jvanderberg</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>i've seen this no NULL's allowed OCD thing being taken to extremessay you sign up a customer today and promise some service that will be done in the near future but you're not sure of the date. since the column is not null you have to put in a crazy date like 1/1/2049. or if something was done in the past then 1/1/1901so if a customer calls in a few days the CSR will happily tell them they can expect to see someone come to their site in 2049</description><pubDate>Mon, 30 Aug 2010 11:32:43 GMT</pubDate><dc:creator>alen teplitsky</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>There is always a time and place for NULLs.  Think "Address Line 2" type fields.  Chances are, the value in those type of fields are NULL more often than their not.  But I woulnd't want a NULL in "Address Line 1", or company name, or...</description><pubDate>Mon, 30 Aug 2010 11:25:28 GMT</pubDate><dc:creator>jvanderberg</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]Steve Jones - Editor (8/30/2010)[/b][hr]I've always seen "N/A" as "Not applicable", but you give some good other responses that might be used. Hence another point for the "non-nulls" argument.[/quote]I have dealt with DBA's who insisted on all columns being non-NULL and I have engaged in "disucssions" about such (both in person and in cyberspace) and I have yet to find a really good argument for changing my position: NULLs should be used when appropriate and avoided when not apporpriate.If there is a referential integrity rule that, in effect, says that every row in TableA has to have a FK/PK link to TableB, then it is inappropriate for that column in TableA to be NULL.  However, if there is a column, for instance, in a Hospital Information System's PatientDetails table, that is supposed to contain a Date of Birth and it is possible that a) that information may notbe readily available (e.g. John Doe is unconscious when admitted) and b) it is not a show stopper if it either is incorrect or unavailable, I would rather have that column in that row left NULL than to assign an arbitrary DOB.  If the column is not essential to the overall integrity of that row, when I am faced with the choice of using an arbitrary default value or leaving the column NULL, I choose the NULL.</description><pubDate>Mon, 30 Aug 2010 11:20:20 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>i've seen instances where most trouble tickets are simple DML to fix errors the application or people made in entering data. sales people are notorious for submitting incomplete or wrong paperwork that requires a lot of work to fix</description><pubDate>Mon, 30 Aug 2010 11:10:25 GMT</pubDate><dc:creator>alen teplitsky</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>I've always seen "N/A" as "Not applicable", but you give some good other responses that might be used. Hence another point for the "non-nulls" argument.</description><pubDate>Mon, 30 Aug 2010 11:08:47 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>Thanks for the article - that is a great reminder.</description><pubDate>Mon, 30 Aug 2010 10:33:38 GMT</pubDate><dc:creator>SQLRNNR</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>[quote][b]Steve Jones - Editor (8/30/2010)[/b][hr]It's a good writeup. And there's always a big debate about NULL being valid or acceptable in RI.I personally think it's OK, but I see the other side, and there's a valid argument there. If you start to use NULLs, you are usually defining it to mean something, like a "N/A", but over time you might find people assuming it means something else. Like the earliest date, or "I don't know". In those cases, you can start to get bugs or misconceptions in the application. Not that you will, but you might. There is a good argument to be made to not allow NULL.[/quote]So, Steve, what is the definition of "N/A"? Is it "Not Applicable" or "Not Available" or "No Answer"? ;-)</description><pubDate>Mon, 30 Aug 2010 10:01:02 GMT</pubDate><dc:creator>RalphWilson</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>One item that was touched on briefly was the downstream reuse of the data for "other" applications.  In this case, it is important for a developer to not only enforce constraints for the initial data but also to realize how the data could be used in a data warehouse environment.  This becomes even more relevent for those that are dealing with "spatial" datasets.  For example; in a recent application and data cleansing assignment that I was working on, spatial data was being entered by end users who had no knowledge or capability to check to see if the data they were entering was actually referencing a valid "roadway network" location.  Knowing that the data collected would eventually be used in a GIS environment, it was paramount that the data collection process be spatially constrained as well as domain constrained.  This was not an easy task to perform in a spatially "un-aware" database, so considerable data validation had to be included the front end application.  Not doing so could result in erroneous location information needing to be "reworked" at a later time and locations that cannot be referenced against a valid datasource.Moral of the story.....constrain and validate your data input and never expect the end users to do anything more than "key-in".</description><pubDate>Mon, 30 Aug 2010 08:51:54 GMT</pubDate><dc:creator>pmenichelli</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>Data integrity and constraint begins with choosing the appropriate data type. Obviously (to most of us) integers should be contained in an Integer data type and monetary values in a decimal with 2 decimal places. However, what I see happen all the time, even in major ISV applications, are date/time values contained in a varchar instead of a proper date data type. Not only does the reporting process take a performance hit with the data conversion, but inevitably there will be the occasional Feb 30. Even worse is when different users or applications use a different coding scheme and you have to sort out (in a where clause at runtime) the context of '3/11/09 6:22'.Is that ...March 11, 2009 ?Nov 3, 2009 ?Nov 9, 2003 ?Sep 11, 2003 ? </description><pubDate>Mon, 30 Aug 2010 08:48:59 GMT</pubDate><dc:creator>Eric M Russell</dc:creator></item><item><title>RE: Why is data integrity important?</title><link>http://www.sqlservercentral.com/Forums/Topic977022-2775-1.aspx</link><description>It's a good writeup. And there's always a big debate about NULL being valid or acceptable in RI.I personally think it's OK, but I see the other side, and there's a valid argument there. If you start to use NULLs, you are usually defining it to mean something, like a "N/A", but over time you might find people assuming it means something else. Like the earliest date, or "I don't know". In those cases, you can start to get bugs or misconceptions in the application. Not that you will, but you might. There is a good argument to be made to not allow NULL.</description><pubDate>Mon, 30 Aug 2010 08:26:48 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item></channel></rss>