|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
Comments posted to this topic are about the item Stairway to Data, Step 7: Data Encoding Schemes - Part I
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: 2 days ago @ 5:04 AM
Points: 1,125,
Visits: 987
|
|
Having different codes for subtle variations like 'unassigned', 'not known', 'missing data' ,'not applicable' , 'none' only helps if all of the people entering data can be made to appreciate the differences and always enter the data correctly. Am I being too cynical in believing that the chances of that happening are close to zero in most systems ?
And why is 5000 bananas in a lifetime unreasonable ? If you live for about 30000 days that's roughly one a week, not allowing for buying for your family.
(This message sponsored by the Banana Marketing Foundation ...)
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 9:31 AM
Points: 1,041,
Visits: 1,356
|
|
Good article But there is a slight logic error in the section on validation and verification.
Validation of an encoding means that the syntax is correct. Verification means that there is a trusted source or procedure for the encoding, so we know the value and the syntax is correct.
(my emphasis)
Consider the five digit ZIP code. A quick validation can be the regular expression “zip_code CHAR(5) NOT NULL CHECK (zip_code LIKE '[0-9][0-9][0-9][0-9][0-9]')” which says that ZIP code is five digits. But this rule is not correct; the lowest ZIP code is 01001 which is for (Agawam, MA) and the highest ZIP code is 99950 (Ketchikan, AK). All zeros and all nines will never be used; they are deliberate dummy values for sorting missing data to the front or back of a list of ZIP codes. Are we done? Nope! There are unused ZIP codes between the highest and lowest code values; for example, Austin, TX has ZIP codes 78739, and 78741 but not 78740.
All of the examples you give show why the validation is not verification, but do not invalidate the validation. In fact the validation you show is a correct validation rule for the syntax. Specifically, the syntax of a 5-digit ZIP code is five numerical digits, and that regular expression validates that syntax. The fact that some ZIP codes are currently not in use or are reserved for special purposes is not a validation (i.e, allowed sytax) issue, but a verification (i.e, allowed values) issue.
Creating a validation rule which checked the entered value against a list of known ZIP codes would be incorrect, because it 1) would perform verification, not validation, and 2) could break as soon as the ZIP code list changed (which does happen.)
|
|
|
|