Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Why is data integrity important? Expand / Collapse
Author
Message
Posted Saturday, August 28, 2010 2:26 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, May 15, 2014 8:31 AM
Points: 419, Visits: 725
Comments posted to this topic are about the item Why is data integrity important?
Post #977022
Posted Monday, August 30, 2010 2:58 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 2:43 PM
Points: 814, Visits: 2,014
Not sure how I see how using Nulls violates data integrity. If your point is that if 0 is an appropriate value when there is not another valid data choice, then yes, the field should be set to NOT NULL with a default value of 0. That is not always the case. I don't need to reiterate the several discussions that have preceeded this, other than to say that the use of NULLS does not automatically violate data integrity.


Post #977198
Posted Monday, August 30, 2010 6:17 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, June 5, 2013 1:15 PM
Points: 14, Visits: 100
Coincidentally I started an internal article for my company, the same day this was posted, on Referential Integrity (RFI).

Some other considerations for a lack of RFI are poor database designs including the shared use of one column to represent many columns. I have witnessed numerous instances of a column we will call "someEntityID" which implicitly references other Entity IDs in other tables. The problem is that there is no actual RFI. The value in the someEntityID column could represent one of a dozen other primary keys in other tables. There are no foreign keys to establish the relationship. It is all implicit and merely coincidental if a value does exist in Table X and references a value in another table.

Besides the RFI issue it is of poor DB design. One should be able to look at a column and know what it represents. One should not have to guess what the column represents or perform klugey LEFT OUTER JOINS or create ad hoc mapping tables. Yes, you could create an additional column to provide context to the column to identify what that particular instance of a column means per record but why? It is non-intuitive and more importantly breaks RFI. It also does not support Codd's 12 rules for a relational database; specifically Rule 0 and Rule 10.

Post #977255
Posted Monday, August 30, 2010 6:25 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, June 5, 2013 1:15 PM
Points: 14, Visits: 100
I would have to agree, the use of NULL itself does not violate RFI.

Even a simple definition via Google clarifies what RFI really is. All RFI is really doing is making sure the relationships exist between tables to maintain data consistency. "Each non-null value of a foreign key must match the value of some primary key." - Information Modeling and Relational Databases; Halipin

So being NULL itself is not the issue but I see his point that come columns that are NULL may be designed poorly and should be NOT NULL. Wouldn't this only be an RFI issue if the application layer was maintaining RFI and not the DB? If one was using the App layer to do this then I could see how data could be inconsistent and the DB would be subject to mistakes made by the application code. This usually results in duplicate records or orphaned records.

An excerpt from an article I am writing for developers in my company:
"There is no one absolute best answer when asking how to implement RFI whereas scenarios will dictate the requirements but there are two primary schools of thought for most databases. The first, and by the way the best practice, is to maintain RI within the database. After all, SQL Server is a relational database management system (RDBMS). Why not let it do what it was designed to do? The second school of thought is to allow software code, within an application, to do this. This quite frankly may have its benefits in limited implementations, such as cross system integrity, but it is fraught with peril and relies on very attentive programmers and acute domain knowledge of the software and database by programmers. It is important to note that the software implementation of referential integrity does not follow Edgard F. Codd's 12 Rules for a relational database listed above. A third option is to mix these two solutions."
Post #977258
Posted Monday, August 30, 2010 8:26 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 6:37 PM
Points: 33,189, Visits: 15,329
It's a good writeup. And there's always a big debate about NULL being valid or acceptable in RI.

I personally think it's OK, but I see the other side, and there's a valid argument there. If you start to use NULLs, you are usually defining it to mean something, like a "N/A", but over time you might find people assuming it means something else. Like the earliest date, or "I don't know". In those cases, you can start to get bugs or misconceptions in the application. Not that you will, but you might. There is a good argument to be made to not allow NULL.







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #977349
Posted Monday, August 30, 2010 8:48 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 7:07 AM
Points: 1,651, Visits: 4,705
Data integrity and constraint begins with choosing the appropriate data type. Obviously (to most of us) integers should be contained in an Integer data type and monetary values in a decimal with 2 decimal places. However, what I see happen all the time, even in major ISV applications, are date/time values contained in a varchar instead of a proper date data type. Not only does the reporting process take a performance hit with the data conversion, but inevitably there will be the occasional Feb 30. Even worse is when different users or applications use a different coding scheme and you have to sort out (in a where clause at runtime) the context of '3/11/09 6:22'.
Is that ...
March 11, 2009 ?
Nov 3, 2009 ?
Nov 9, 2003 ?
Sep 11, 2003 ?

Post #977369
Posted Monday, August 30, 2010 8:51 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 23, 2010 7:07 AM
Points: 3, Visits: 18
One item that was touched on briefly was the downstream reuse of the data for "other" applications. In this case, it is important for a developer to not only enforce constraints for the initial data but also to realize how the data could be used in a data warehouse environment. This becomes even more relevent for those that are dealing with "spatial" datasets. For example; in a recent application and data cleansing assignment that I was working on, spatial data was being entered by end users who had no knowledge or capability to check to see if the data they were entering was actually referencing a valid "roadway network" location. Knowing that the data collected would eventually be used in a GIS environment, it was paramount that the data collection process be spatially constrained as well as domain constrained. This was not an easy task to perform in a spatially "un-aware" database, so considerable data validation had to be included the front end application. Not doing so could result in erroneous location information needing to be "reworked" at a later time and locations that cannot be referenced against a valid datasource.

Moral of the story.....constrain and validate your data input and never expect the end users to do anything more than "key-in".
Post #977376
Posted Monday, August 30, 2010 10:01 AM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, November 11, 2011 1:21 PM
Points: 50, Visits: 162
Steve Jones - Editor (8/30/2010)
It's a good writeup. And there's always a big debate about NULL being valid or acceptable in RI.

I personally think it's OK, but I see the other side, and there's a valid argument there. If you start to use NULLs, you are usually defining it to mean something, like a "N/A", but over time you might find people assuming it means something else. Like the earliest date, or "I don't know". In those cases, you can start to get bugs or misconceptions in the application. Not that you will, but you might. There is a good argument to be made to not allow NULL.

So, Steve, what is the definition of "N/A"? Is it "Not Applicable" or "Not Available" or "No Answer"?



Ralph D. Wilson II
Development DBA

"Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
A. Lincoln
Post #977423
Posted Monday, August 30, 2010 10:33 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 1:37 PM
Points: 21,639, Visits: 15,309
Thanks for the article - that is a great reminder.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server


SQL RNNR

Posting Performance Based Questions - Gail Shaw
Posting Data Etiquette - Jeff Moden
Hidden RBAR - Jeff Moden
VLFs and the Tran Log - Kimberly Tripp
Post #977441
Posted Monday, August 30, 2010 11:08 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 6:37 PM
Points: 33,189, Visits: 15,329
I've always seen "N/A" as "Not applicable", but you give some good other responses that might be used. Hence another point for the "non-nulls" argument.






Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #977457
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse