Why is data integrity important?

  • jvanderberg

    SSCertifiable

    Points: 6762

    Comments posted to this topic are about the item Why is data integrity important?

  • RonKyle

    SSC-Dedicated

    Points: 31444

    Not sure how I see how using Nulls violates data integrity. If your point is that if 0 is an appropriate value when there is not another valid data choice, then yes, the field should be set to NOT NULL with a default value of 0. That is not always the case. I don't need to reiterate the several discussions that have preceeded this, other than to say that the use of NULLS does not automatically violate data integrity.

  • stevedln

    SSC Enthusiast

    Points: 132

    Coincidentally I started an internal article for my company, the same day this was posted, on Referential Integrity (RFI).

    Some other considerations for a lack of RFI are poor database designs including the shared use of one column to represent many columns. I have witnessed numerous instances of a column we will call "someEntityID" which implicitly references other Entity IDs in other tables. The problem is that there is no actual RFI. The value in the someEntityID column could represent one of a dozen other primary keys in other tables. There are no foreign keys to establish the relationship. It is all implicit and merely coincidental if a value does exist in Table X and references a value in another table.

    Besides the RFI issue it is of poor DB design. One should be able to look at a column and know what it represents. One should not have to guess what the column represents or perform klugey LEFT OUTER JOINS or create ad hoc mapping tables. Yes, you could create an additional column to provide context to the column to identify what that particular instance of a column means per record but why? It is non-intuitive and more importantly breaks RFI. It also does not support Codd's 12 rules for a relational database; specifically Rule 0 and Rule 10.

  • stevedln

    SSC Enthusiast

    Points: 132

    I would have to agree, the use of NULL itself does not violate RFI.

    Even a simple definition via Google clarifies what RFI really is. All RFI is really doing is making sure the relationships exist between tables to maintain data consistency. "Each non-null value of a foreign key must match the value of some primary key." - Information Modeling and Relational Databases; Halipin

    So being NULL itself is not the issue but I see his point that come columns that are NULL may be designed poorly and should be NOT NULL. Wouldn't this only be an RFI issue if the application layer was maintaining RFI and not the DB? If one was using the App layer to do this then I could see how data could be inconsistent and the DB would be subject to mistakes made by the application code. This usually results in duplicate records or orphaned records.

    An excerpt from an article I am writing for developers in my company:

    "There is no one absolute best answer when asking how to implement RFI whereas scenarios will dictate the requirements but there are two primary schools of thought for most databases. The first, and by the way the best practice, is to maintain RI within the database. After all, SQL Server is a relational database management system (RDBMS). Why not let it do what it was designed to do? The second school of thought is to allow software code, within an application, to do this. This quite frankly may have its benefits in limited implementations, such as cross system integrity, but it is fraught with peril and relies on very attentive programmers and acute domain knowledge of the software and database by programmers. It is important to note that the software implementation of referential integrity does not follow Edgard F. Codd's 12 Rules for a relational database listed above. A third option is to mix these two solutions."

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 715107

    It's a good writeup. And there's always a big debate about NULL being valid or acceptable in RI.

    I personally think it's OK, but I see the other side, and there's a valid argument there. If you start to use NULLs, you are usually defining it to mean something, like a "N/A", but over time you might find people assuming it means something else. Like the earliest date, or "I don't know". In those cases, you can start to get bugs or misconceptions in the application. Not that you will, but you might. There is a good argument to be made to not allow NULL.

  • Eric M Russell

    SSC Guru

    Points: 124993

    Data integrity and constraint begins with choosing the appropriate data type. Obviously (to most of us) integers should be contained in an Integer data type and monetary values in a decimal with 2 decimal places. However, what I see happen all the time, even in major ISV applications, are date/time values contained in a varchar instead of a proper date data type. Not only does the reporting process take a performance hit with the data conversion, but inevitably there will be the occasional Feb 30. Even worse is when different users or applications use a different coding scheme and you have to sort out (in a where clause at runtime) the context of '3/11/09 6:22'.

    Is that ...

    March 11, 2009 ?

    Nov 3, 2009 ?

    Nov 9, 2003 ?

    Sep 11, 2003 ?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • pmenichelli

    SSC Rookie

    Points: 39

    One item that was touched on briefly was the downstream reuse of the data for "other" applications. In this case, it is important for a developer to not only enforce constraints for the initial data but also to realize how the data could be used in a data warehouse environment. This becomes even more relevent for those that are dealing with "spatial" datasets. For example; in a recent application and data cleansing assignment that I was working on, spatial data was being entered by end users who had no knowledge or capability to check to see if the data they were entering was actually referencing a valid "roadway network" location. Knowing that the data collected would eventually be used in a GIS environment, it was paramount that the data collection process be spatially constrained as well as domain constrained. This was not an easy task to perform in a spatially "un-aware" database, so considerable data validation had to be included the front end application. Not doing so could result in erroneous location information needing to be "reworked" at a later time and locations that cannot be referenced against a valid datasource.

    Moral of the story.....constrain and validate your data input and never expect the end users to do anything more than "key-in".

  • RalphWilson

    Mr or Mrs. 500

    Points: 592

    Steve Jones - Editor (8/30/2010)


    It's a good writeup. And there's always a big debate about NULL being valid or acceptable in RI.

    I personally think it's OK, but I see the other side, and there's a valid argument there. If you start to use NULLs, you are usually defining it to mean something, like a "N/A", but over time you might find people assuming it means something else. Like the earliest date, or "I don't know". In those cases, you can start to get bugs or misconceptions in the application. Not that you will, but you might. There is a good argument to be made to not allow NULL.

    So, Steve, what is the definition of "N/A"? Is it "Not Applicable" or "Not Available" or "No Answer"? 😉

    Ralph D. Wilson II
    Development DBA

    "Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
    A. Lincoln

  • SQLRNNR

    SSC Guru

    Points: 281210

    Thanks for the article - that is a great reminder.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 715107

    I've always seen "N/A" as "Not applicable", but you give some good other responses that might be used. Hence another point for the "non-nulls" argument.

  • alen teplitsky

    SSC-Dedicated

    Points: 30014

    i've seen instances where most trouble tickets are simple DML to fix errors the application or people made in entering data. sales people are notorious for submitting incomplete or wrong paperwork that requires a lot of work to fix

  • RalphWilson

    Mr or Mrs. 500

    Points: 592

    Steve Jones - Editor (8/30/2010)


    I've always seen "N/A" as "Not applicable", but you give some good other responses that might be used. Hence another point for the "non-nulls" argument.

    I have dealt with DBA's who insisted on all columns being non-NULL and I have engaged in "disucssions" about such (both in person and in cyberspace) and I have yet to find a really good argument for changing my position: NULLs should be used when appropriate and avoided when not apporpriate.

    If there is a referential integrity rule that, in effect, says that every row in TableA has to have a FK/PK link to TableB, then it is inappropriate for that column in TableA to be NULL. However, if there is a column, for instance, in a Hospital Information System's PatientDetails table, that is supposed to contain a Date of Birth and it is possible that a) that information may notbe readily available (e.g. John Doe is unconscious when admitted) and b) it is not a show stopper if it either is incorrect or unavailable, I would rather have that column in that row left NULL than to assign an arbitrary DOB. If the column is not essential to the overall integrity of that row, when I am faced with the choice of using an arbitrary default value or leaving the column NULL, I choose the NULL.

    Ralph D. Wilson II
    Development DBA

    "Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
    A. Lincoln

  • jvanderberg

    SSCertifiable

    Points: 6762

    There is always a time and place for NULLs. Think "Address Line 2" type fields. Chances are, the value in those type of fields are NULL more often than their not. But I woulnd't want a NULL in "Address Line 1", or company name, or...

  • alen teplitsky

    SSC-Dedicated

    Points: 30014

    i've seen this no NULL's allowed OCD thing being taken to extremes

    say you sign up a customer today and promise some service that will be done in the near future but you're not sure of the date. since the column is not null you have to put in a crazy date like 1/1/2049. or if something was done in the past then 1/1/1901

    so if a customer calls in a few days the CSR will happily tell them they can expect to see someone come to their site in 2049

  • jvanderberg

    SSCertifiable

    Points: 6762

    It's as they say: no matter how good a thing is, there is always such a thing as too much of it.

Viewing 15 posts - 1 through 15 (of 44 total)

You must be logged in to reply to this topic. Login to reply