Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««23456»»»

Stairway to Database Design - STEP 1: Data Elements Expand / Collapse
Author
Message
Posted Saturday, July 23, 2011 12:00 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, May 21, 2012 5:58 AM
Points: 1, Visits: 6
I am so satisfied
Post #1147025
Posted Monday, August 27, 2012 7:53 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, December 2, 2013 6:30 AM
Points: 346, Visits: 691
A very nice article, but I have a couple of quibbles.

Likewise, avoid quoted identifiers with either the Microsoft-only square brackets or the ANSI/ISO double quote marks. This is just sloppy programming done for display formatting in the data base instead of the front end.


I respectfully must disagree. Our company's coding standard mandates that all tables and fields be quoted with square brackets. It provides a very unambiguous reference for the eye as you scan across the code. Combine that with auto formatting (We happen to use Red Gate's SQL Prompt but there are lots of others) and it makes for extremely legible code--which is a godsend.

It also allows use of spaces and hyphens in table and column names, and reduces the need for underscores, which in my opinion is also a legibility plus. Of course YMMV.


"_id" = identifier. It is unique in the schema and refer to one entity anywhere it appears in the schema. Uniqueness is not the same thing as being an identifier. The number pi is unique, but it does not identify an entity. Never use "<table name>_id"; that is a name based on location and tell you this is probably not a real key at all. Just plain "id" is too vague and screws up your data dictionary when you have a zillion of them. Obviously, auto-incrementing values are not identifiers. I will get into that fallacy later.


Here's another place I have to disagree. From a mechanical standpoint identifiers have a number of advantages compared to the so-called natural key. Assuming you're able to use a 4 byte integer as a simple serial number (which is typical) you have a small, ever increasing unique value. And that's good for preventing table splitting and other performance issues.

The fact it's an arbitrary number that only indicates sequence is pretty much irrelevant to its use as a primary key.

Of course just about any table needs some kind of natural key (usually a name) to enforce uniqueness as well, but names make poor primary keys.

Which brings up the Customer/Customers naming quibble. We happen to use the singular form for tables, which means we use [Customer RID] as the primary key. (RID has a very precise meaning in our company, basically a 4 byte identifier as opposed to a natural key).

Since [Customer] is an entity [Customer RID] is naturally the primary key of that entity--but one that violates Mr Celko's naming convention advice.

However, as I said, these are quibbles and over all a very nice article!
Post #1350391
Posted Monday, August 27, 2012 1:05 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, September 15, 2014 10:35 AM
Points: 38, Visits: 536
I would like some clarification on the use of IDENTITY in logical vs physical models. I understand not using identities in a logical model, but in a table in a physical model, especially one with a multi-column natural key, why wouldn't I use an identity column as a primary key with the natural key constrained with a unique key?

Also, in tables where there is no natural key, say a customer table, what is wrong with using an identity to create the unique identifier? The following options, while close, cannot be guarantied to be unique: Name, Social Security Number, Date of Birth, Gender, Email, Phone, Location or Address. An identifier is generated at the time of creating the customer record and it doesn't matter if it is consecutive, or greater than the number of the last customer. I don't care that it was magically created out of thin air; the act of creation has allowed me to uniquely identify something (or someone) that I couldn't before. It is easy to place that number on an invoice, and if there is ever a problem, when they call in we can ask for that number and guaranty we are looking at the correct John Smith. Thoughts?
Post #1350539
Posted Tuesday, August 28, 2012 9:04 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 1:05 PM
Points: 137, Visits: 869
The article is good but too abstract for me to relate to my databases. I'll have to get one of Joe's books and go through the process of concrete-to-abstract induction. The article also has too many "don'ts" leaving me confused as to what to "do".

I can say after two decades I'm ready to give up identities. Uniqueidentifiers are bliss when using a miserable technology like ms sync. Also, inserting XML batched rows is a huge performance improvement, especially since you no longer need to figure out which identity got generated for which transaction row inserted. Transaction rows rarely have an alternate key.

I've also recently switched to singular table names for new projects. Singular names are bliss when using code generation which I do with a few simple procs. Generating pluralities like Parties.PartyKey was irritating compared to the simplicity of Party.PartyKey that I do now.

Post #1351020
Posted Wednesday, August 29, 2012 9:16 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Tuesday, April 15, 2014 8:03 AM
Points: 825, Visits: 319
If someone has mentioned this already, apologies, but here goes.

Re ""_nbr" or “_num" = tag number; this is a string of digits that names something. Do not use "_no" since it looks like the Boolean yes/no value. I prefer "nbr" to "num" since it is used as a common abbreviation in several European languages and the combination of similar shaped letters in “_num” are [sic] visually confusing."

_nbr looks too much like the Spanish 'nombre' for me to use it as an abbreviation for number.
Post #1351697
Posted Saturday, September 1, 2012 8:39 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 10:38 AM
Points: 1,945, Visits: 3,004
_nbr looks too much like the Spanish 'nombre' for me to use it as an abbreviation for number.


Good argument; what is the common abbreviation in Spanish speaking countries these days? I admit that I want to make English the language of IT, which it pretty much is these days. We cannot use # because it violated ISO rules and will screw up lots of stuff. Remember C# and the internet?


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1353183
Posted Saturday, September 1, 2012 8:53 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 10:38 AM
Points: 1,945, Visits: 3,004
The article also has too many "don'ts" leaving me confused as to what to "do".


There are lots of good ways to do something, but the bad things are easier to point out. Ever see a positively worded law? They a e always "Thou shalt not.."

>> I can say after two decades I'm ready to give up identities. Uniqueidentifiers are bliss when using a miserable technology like ms sync. <<

GUID, UUID, etc are meant to be used at the global level for network and web stuff, never in the database. I always get a kick out of seeing a DB with a GUID for a key where two SMALLINTs would be a key that is an industry standard, human readable, etc. But he is happy that it is only one column!

>> I've also recently switched to singular table names for new projects. Singular names are bliss when using code generation which I do with a few simple procs. Generating pluralities like Parties.PartyKey was irritating compared to the simplicity of Party.PartyKey that I do now. [/quote]

But "_key" is meta data (how the data element is used, not what it is by its nature) and will never be used in ISO-11179. Why would anyone generate table names? The data model should be complete before you get to that point.



Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1353184
Posted Saturday, September 1, 2012 9:29 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 10:38 AM
Points: 1,945, Visits: 3,004
I respectfully must disagree. Our company's coding standard mandates that all tables and fields be quoted with square brackets. It provides a very unambiguous reference for the eye as you scan across the code.


No, I worked for AIRMICS (Army Ins-
+










`
.
0+ for Resear
in Mismanagement, Information and Computer Sienes) )


-
combine that with auto formatting (We happen to use Red Gate's SQL Prompt but there are lots of others) and it makes for extremely legible code--which is a godsend.



It also allows use of spaces and hyphens in table and column names, and reduces the need for underscores, which in my opinion is also a legibility plus. Of course YMMV.


"_id" = identifier. It is unique in the schema and refer to one entity anywhere it appears in the schema. Uniqueness is not the same thing as being an identifier. The number pi is unique, but it does not identify an entity. Never use "<table name>_id"; that is a name based on location and tell you this is probably not a real key at all. Just plain "id" is too vague and screws up your data dictionary when you have a zillion of them. Obviously, auto-incrementing values are not identifiers. I will get into that fallacy later.


Here's another place I have to disagree. From a mechanical standpoint identifiers have a number of advantages compared to the so-called natural key. Assuming you're able to use a 4 byte integer as a simple serial number (which is typical) you have a small, ever increasing unique value. And that's good for preventing table splitting and other performance issues.

The fact it's an arbitrary number that only indicates sequence is pretty much irrelevant to its use as a primary key.

Of course just about any table needs some kind of natural key (usually a name) to enforce uniqueness as well, but names make poor primary keys.

Which brings up the Customer/Customers naming quibble. We happen to use the singular form for tables, which means we use [Customer RID] as the primary key. (RID has a very precise meaning in our company, basically a 4 byte identifier as opposed to a natural key).

Since [Customer] is an entity [Customer RID] is naturally the primary key of that entity--but one that violates Mr Celko's naming convention advice.

However, as I said, these are quibbles and over all a very nice article![/quote]


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1353186
Posted Saturday, September 1, 2012 10:28 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 10:38 AM
Points: 1,945, Visits: 3,004
Our company's coding standard mandates that all tables and fields be quoted with square brackets. It provides a very unambiguous reference for the eye as you scan across the code.


The square brackets lock you into Microsoft and MS languages. You cannot have a common data dictionary over your own enterprise much less the industry. At least the double quote are ANSI/ISO Standard SQL syntax and will port. And everyone gets an extra space inside the brackets and screws up their code.

But it is worse than that: ISO now uses a subset of Latin-1 (alphas, digit and a few punctuation marks) for all IT standard. Unicode require that this subset appears in all languages, so they can be used. Quoted identifiers of either kind just screw up your access to ISO and any standard encoding schemes.

I worked for AIRMICS (Army Institute for Management, Information and Computer Sciences) doing research on code readability for a few years. Your tax dollars at work My remarks are based on that work.

Your eyes are trained to jump to certain things in languages and square brackets, camelCase, PascalCase, Hungarian notation and some other things mess up your reading. Your eyes will twitch to jump to a Bouma (word or token read as a pure visual unit), a capital letter (start of sentence or word) or to get over a punctuation mark (like a bracket). I have a whole chapters in SQL PROGRAMMING STYLE on this topic.

Combine that with auto formatting (We happen to use Red Gate's SQL Prompt but there are lots of others) and it makes for extremely legible code--which is a godsend.


It also allows use of spaces and hyphens in table and column names, and reduces the need for underscores, which in my opinion is also a legibility plus. Of course YMMV. [/quotes]

Embedded spacing is for presentation layer and not in data element names (ISO-11179 and many other standards). Hyphen? You mean COBOL minus signs? You just lost all the benefits of ISO Standards. The underscore is interesting. We have used lined paper so long that it does not cause a pause in reading like other punctuation marks inside a string; that is in part why the ISO-11179 standards use it.

"_id" = identifier. It is unique in the schema and refer to one entity anywhere it appears in the schema. .. Here's another place I have to disagree. From a mechanical standpoint identifiers have a number of advantages compared to the so-called natural key. Assuming you're able to use a 4 byte integer as a simple serial number (which is typical) you have a small, ever increasing unique value. And that's good for preventing table splitting and other performance issues.


If a natural key exists, then youhave to have to it and make it unique. First look for an industry standard for an identifier, then for the natural key. As a last resort,invent your own keys, but give them validation and verification. Data integrity is important.

The fact it's an arbitrary number that only indicates sequence is pretty much irrelevant to its use as a primary key.


How do you ensure that the local Khabalah Magic number is in synch with the real key(s)? No validation and verification? Redundancy? All the things we hate in RDBMS.

Of course just about any table needs some kind of natural key (usually a name) to enforce uniqueness as well, but names make poor primary keys.


Depends on the entity, of course. Biology has a very detailed naming system for things.

Which brings up the Customer/Customers naming quibble


ISO-11179 and INCITS-L8 rules, not quibbles.

[quote ] We happen to use the singular form for tables, which means we use "Customer RID" as the primary key. (RID has a very precise meaning in our company, basically a 4 byte identifier as opposed to a natural key).


if the RID is an industry standard, then it is fine to use it; I use VIN, ZIP, SSN, etc, but the role prefix needs an underscore, not a space.

Since "Customer" is an entity [Customer RID] is naturally the primary key of that entity--but one that violates Mr Celko's naming convention advice.


No, the name of the set (table) should be "Customers", "Clientele' or another collective/plural name. Now, you can use "customer_rid" to show that this data element has a role as a data element for a customer, as opposed to, say, "employee_rid" or something else. Think of the basic syntax : [<role>_]<attribute name>_<attribute property>


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1353187
Posted Saturday, September 1, 2012 10:41 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 10:38 AM
Points: 1,945, Visits: 3,004
I understand not using identities in a logical model, but in a table in a physical model, especially one with a multi-column natural key, why wouldn't I use an identity column as a primary key with the natural key constrained with a unique key?


Why add redundancy and overhead? And eh extra code to keep the "fake pointer chains", which is exactly what hey are, in synch?

Also, in tables where there is no natural key, say a customer table, what is wrong with using an IDENTITY property to create the unique identifier?


Bad example. I need a customer id that I can verify. Most of the websites I visit use my email address; the DUNS is the commercial standard; credit card numbers and other things do this, too.

An identifier is generated at the time of creating the customer record [sic: rows are not records] and it doesn't matter if it is consecutive, or greater than the number of the last customer.


But it does matter if nobody else in the enterprise or universe can verify it. What you want to do is throw away the VIN and identify your automobile by its parking space number in one parking garage. Hey, it works right now, for me! Who cares about anyone else or industry standards! Of course, if I move the car out of the garage or even within the garage then we are screwed. I have a great story about a taxi company motor pool that used license tags, taxi numbers, medallions but not VINs.

when they call in we can ask for that number and guaranty we are looking at the correct John Smith.


Now call into another office or company and see if they have the same magical id. Bet not



Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1353189
« Prev Topic | Next Topic »

Add to briefcase «««23456»»»

Permissions Expand / Collapse