Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Stairway to Data, Step 7: Data Encoding Schemes - Part I Expand / Collapse
Author
Message
Posted Thursday, May 12, 2011 10:12 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Sunday, July 20, 2014 11:55 AM
Points: 1,945, Visits: 2,860
Comments posted to this topic are about the item Stairway to Data, Step 7: Data Encoding Schemes - Part I

Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1108184
Posted Wednesday, August 10, 2011 2:10 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, July 9, 2014 2:52 AM
Points: 1,153, Visits: 1,047
Having different codes for subtle variations like 'unassigned', 'not known', 'missing data' ,'not applicable' , 'none' only helps if all of the people entering data can be made to appreciate the differences and always enter the data correctly. Am I being too cynical in believing that the chances of that happening are close to zero in most systems ?

And why is 5000 bananas in a lifetime unreasonable ? If you live for about 30000 days that's roughly one a week, not allowing for buying for your family.

(This message sponsored by the Banana Marketing Foundation ...)
Post #1157446
Posted Wednesday, August 10, 2011 10:02 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 7:26 AM
Points: 1,293, Visits: 1,646
Good article But there is a slight logic error in the section on validation and verification.


Validation of an encoding means that the syntax is correct. Verification means that there is a trusted source or procedure for the encoding, so we know the value and the syntax is correct.
(my emphasis)

Consider the five digit ZIP code. A quick validation can be the regular expression “zip_code CHAR(5) NOT NULL CHECK (zip_code LIKE '[0-9][0-9][0-9][0-9][0-9]')” which says that ZIP code is five digits. But this rule is not correct; the lowest ZIP code is 01001 which is for (Agawam, MA) and the highest ZIP code is 99950 (Ketchikan, AK). All zeros and all nines will never be used; they are deliberate dummy values for sorting missing data to the front or back of a list of ZIP codes. Are we done? Nope! There are unused ZIP codes between the highest and lowest code values; for example, Austin, TX has ZIP codes 78739, and 78741 but not 78740.


All of the examples you give show why the validation is not verification, but do not invalidate the validation. In fact the validation you show is a correct validation rule for the syntax. Specifically, the syntax of a 5-digit ZIP code is five numerical digits, and that regular expression validates that syntax. The fact that some ZIP codes are currently not in use or are reserved for special purposes is not a validation (i.e, allowed sytax) issue, but a verification (i.e, allowed values) issue.

Creating a validation rule which checked the entered value against a list of known ZIP codes would be incorrect, because it 1) would perform verification, not validation, and 2) could break as soon as the ZIP code list changed (which does happen.)
Post #1157782
Posted Sunday, January 5, 2014 11:45 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, March 27, 2014 2:39 PM
Points: 2,141, Visits: 486
"The Blood and Guts Of Master Data Management"
Post #1527918
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse