Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««34567»»»

Design Expand / Collapse
Author
Message
Posted Friday, September 24, 2010 8:20 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, November 20, 2014 8:13 AM
Points: 2,917, Visits: 2,536
jeff.mason
It happens under the hood, and always has AFAIK. If there isn't a uniqueidentifier replication will add one to every table on the system and if replication is deleted, the column will be dropped as part of removing replication.
Robert Frasca
Actually, that's a "uniqueifier" not a uniqueidentifier. A uniqueifier is a 4-byte column added to make a primary key unique



Robert, I find your comment interesting, but am unable to prove it.

I just tried to create a publication (SS2005) on a table w/o a PK and recieved the error "This table cannot be published because it does not have a primary key column. Primary key columns are required for all tables in transactional publications."

I took a test table, created a nonunique PK on it, and it allowed me to create a publication.

Maybe, what you described is actually happening behind the scenes, I cannot prove that but none of my tools shows anything hidden nor do I see anything in any of the system tables.


Steve Jimmo
Sr DBA
“If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan
Post #992821
Posted Friday, September 24, 2010 8:24 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 12:30 PM
Points: 554, Visits: 672
jeff.mason (9/24/2010)
Robert Frasca (9/24/2010)
jeff.mason (9/24/2010)
sjimmo (9/24/2010)
Quoting BOL (gasp) again (from the uniqueidentifier TSQL reference article):


I use replication heavily in our stores, to move high priority data back and forth with HQ. I personally have not had to use UNIQUEIDENTIFIER in any of the tables I have replicated, but do have to have a PK, which would be unique. If I use the GUI to create my publication, it will not allow me to select any table without a PK, and if I run a script without a PK, it gives me an error. Is this a new requirement of SS2008?


It happens under the hood, and always has AFAIK. If there isn't a uniqueidentifier replication will add one to every table on the system and if replication is deleted, the column will be dropped as part of removing replication.


Actually, that's a "uniqueifier" not a uniqueidentifier. A uniqueifier is a 4-byte column added to make a primary key unique.


That's not how BO seems to indicate it. From BO:

"A column named rowguid is added to each published table, unless the table already has a column of data type uniqueidentifier with the ROWGUIDCOL property set (in which case this column is used). The rowguid column is used to uniquely identify every row in every published table. If the table is dropped from the publication, the rowguid column is removed; if an existing column was used for tracking, the column is not removed."

Granted, it doesn't tell you the data type of rowguid but from that and the name I always assumed it was a uniqueidentifer as well.



Sorry, my bad. What you're speaking of is if no clustered index exists. It will add one. What I'm speaking of is if a clustered key exists but isn't unique.


"Beliefs" get in the way of learning.
Post #992824
Posted Friday, September 24, 2010 8:32 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, November 20, 2014 8:13 AM
Points: 2,917, Visits: 2,536
Robert Frasca (9/24/2010)
Sorry, my bad. What you're speaking of is if no clustered index exists. It will add one. What I'm speaking of is if a clustered key exists but isn't unique.


Then it would make sense, though where does the system store the information? As I said previously, I created a nonunique PK but wasn't able to find anything added though I KNOW that there is a requirement for a unique field in order for the replication to be able to perform it's updates and deletes on. This is just a nice to know tidbit.


Steve Jimmo
Sr DBA
“If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan
Post #992833
Posted Friday, September 24, 2010 8:34 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 10:52 AM
Points: 1,366, Visits: 1,729
Hugo Kornelis (9/24/2010)
tommyh (9/23/2010)
Have to agree with UMG Developer. The "Multiple identifier columns" shouldnt be part of the answer for 2 reasons.

1. Its not part of MS recommendations (atleast not on the page thats linked in the answer)

2. In the answer you have yourself added
While having multiple identifier columns is not, strictly speaking, against the rules, it makes the table design overly complex, wastes space and adds no value

So your contradicting yourself. Now i do agree that that might not be the best solution but that doesnt make it wrong.

The more I think about this, the more I disagree with the notion that multiple identifier columns is a design flaw.

For example, cars in a table might be idenitified by the VIN. That number is relevant for communications with the supplier and with the official agencies, but employees can't ever remember it, so the license plate number is added as a second identifier column. However, if the table is used in replication, or if it uses FILESTREAM columns, SQL Server requires that a surrogate key with the UNIQUEIDENTIFIER data type is also present in the table. And finally, if there are many references to the cars table, neither the VIN, nor the license number or the UID are a good and space-efficient method for the reference, so it might make sense to add another identifier column, with a numeric data type and the IDENTITY property. (This has the added advantage that inserts won't cause table splits - probaby not relevant in a cars table, but in casese where thousands of rows are added each second, this can be the difference between acceptable performance and utter slowness).

That are, let me count, no less than four identifier columns. Of course all backed by a PRIMARY KEY or UNIQUE constraint. And all relevant. I see no design flaw whatsoever, and I also don't see how this "makes the table design overly complex, wastes space and adds no value".


I agree with you, but I disagree with you (as usual ). The first two you mention I would not consider identifier columns for the database. They may be business logic identifiers, but I would consider using them as identifiers for database programming to be a serious lapse in judgment -- because they're user-entry and user-editable*, you're simply asking for trouble. Oh, and what about cars that don't have plates yet -- or you don't have the plate info? Might need a null for that column; there goes your UNIQUE constraint.

So that gives us 2 identifier columns, remarkably similar to the two in this QotD -- and that's where you may have a point. But I'd caution here as well. First, we need to know why that surrogate UID is there.

If it's for replication, then you want to be sure that 1) you are only replicating from one source, and the app owner isn't ever going to want to replicate any of the other tables linked via your IDENTITY column, because then you have the chance for collisions. As much as I know you loathe using UIDs as foreign keys, it's often necessary when replication is involved.

If it's for FILESTREAM data, consider whether the FILESTREAM data always has to be accessed with the rest of the data. If not, I'd say it makes more sense to separate the FILESTREAM data to a separate table with the UID as its identifier and a foreign key to the now only identifier in the cars table, the IDENTITY column.

* (What you don't think plate numbers change? As for VINs, are you going to tell Doris in data entry that she has to delete the entire record and start again just because she missed a 9? Good luck with that, buddy!)
Post #992838
Posted Friday, September 24, 2010 8:45 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 12:30 PM
Points: 554, Visits: 672
sjimmo (9/24/2010)
jeff.mason
It happens under the hood, and always has AFAIK. If there isn't a uniqueidentifier replication will add one to every table on the system and if replication is deleted, the column will be dropped as part of removing replication.
Robert Frasca
Actually, that's a "uniqueifier" not a uniqueidentifier. A uniqueifier is a 4-byte column added to make a primary key unique



Robert, I find your comment interesting, but am unable to prove it.

I just tried to create a publication (SS2005) on a table w/o a PK and recieved the error "This table cannot be published because it does not have a primary key column. Primary key columns are required for all tables in transactional publications."

I took a test table, created a nonunique PK on it, and it allowed me to create a publication.

Maybe, what you described is actually happening behind the scenes, I cannot prove that but none of my tools shows anything hidden nor do I see anything in any of the system tables.



LOL. I think we're having two different discussions. The uniqueifier is created when a non-unique clustered index is created. This makes the clustered index unique. Did you actually define that clustered index as the primary key, i.e. with the primary key constraint? In other words, is this a chicken or egg scenario, i.e. since, by default, the primary key is a clustered index did it do all that automatically or did you have to create the clustered index on a non-unique set of values and then designate it as the primary key after the fact?

The section about "uniqueifiers" is in the "Clustered Index Design Guidelines" section of BOL. It isn't visible or accessible to users by the way.

I was thinking that replication would automatically provide a primary key if one didn't exist but I guess I'm mistaken. Since I always create a primary key on every table it hasn't ever really come up.


"Beliefs" get in the way of learning.
Post #992850
Posted Friday, September 24, 2010 8:45 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 4, 2014 9:44 AM
Points: 127, Visits: 149
I'm totally in agreement with Hugo on this one. But I did flag it for multiple identifiers, while lots of tables have multiple Unique Keys (Boyce-Codd normal form anyone?) it's probably bad form to have multiple surrogate keys. The example you give about Vehicles having multiple keys is a good one but in that case it has multiple DATA keys which are each legitimate references in different contexts for the object. Creating multiple GENERATED keys is probably unnecessarily complex.

But when it comes to NULLS there are just cases where it makes sense. I got that one wrong as I didn't consider the SalesPersonCell column to be a bad null as it shouldn't even be in the table at all. The FulfilmentDate on the other hand is a perfect case where it makes sense to have a single nullable column which is automatic and denotes the state of a record. If the FulfilmentDate is currently NULL the record has just been entered, the Order Entry system doesn't even allow the clerk to enter a value. When the order fulfilment process happens a trigger automatically puts the current date in that field.

Sure you can add an Order State column which shows you if the order has been filled or not but then you introduce a case where the Fulfilment date depends on something other than the primary key. I guess on the other hand if you have other data that the fulfilment process has to store that's an argument for moving the date out of this table completely storing it in that table and if you need to show the date on the orders screen you get it via a JOIN on the Orders PK. But I certainly wouldn't create a table just to store that one piece of data.
Post #992851
Posted Friday, September 24, 2010 8:55 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 8:00 AM
Points: 985, Visits: 1,839
On the replication and uniqueidentifer topic:

PK is required for transactional and merge. It must be declared as such or else you can't even create a publication with that table as an article. A rowguid is nice but not required. If you don't have one, replication makes one for you. This is so under the hood that it may not even show up on the screen (I don't have one that fits this requirement to check; our merge pubs all declare explicit uniqueidentifiers). So you can still create a merge/updating transactional publication without having a uniqueidentifier, but that doesn't mean that SQL is not creating and using one off in the background for you.

And yeah, I could easily see the confusion for this vs. unique clustered index. It's a similar concept, and that IS just a 4 byte value.
Post #992860
Posted Friday, September 24, 2010 8:58 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 4, 2014 9:44 AM
Points: 127, Visits: 149
Rune,

I've got some similar situations in a database that I'm working with and found that the alternative seems to be just as bad. We've created functions to get and set min and max dates and usually will default a date to the f_MinDate() just so we don't have NULLs in the database. But now you still have to check if the value is "MinDate" and if so leave it out of your comparison.

That usually has me writting code that looks like:
WHERE ( Table.Date = f_MinDate() OR Table.Date BETWEEN @StartDate AND @EndDate )

Not sure if that is much better? And it gets even uglier if you allow the start and end dates of the period in question to be allowed to be null. Don't really have an answer to the issue, seems like you're damned if you do and damned if you don't.


Rune Bivrin (9/24/2010)
jeff.mason (9/24/2010)
As far as NULLs go, I have to say NULL is a good thing for some DATE columns. If you have, say, a StartDate and an EndDate column, you know the StartDate when the row is created but you may not know the EndDate. If you insist on having no NULLs, then you have to make up some arbitrary date (usually 1/1/1900) to allow a row to be created. But then that row works on date logic for functions, et al. Simply letting a Date field be NULL in those cases where you lack knowledge makes life so much easier.

Absolutely. Unless, of course, you find yourself writing a lot of code like
WHERE getdate() BETWEEN ISNULL(StartDate, '1900-01-01') AND ISNULL(EndDate, '9999-12-31')

in which case the NULLs will quite possibly kill your performance. I have worked quite a bit with industrial insurance where there are periods on everything, and that gets old very quickly...

Unfortunately SQL Server lacks stuff like DATE.MinValue constants that would be a good compromise.

Not that I'm arguing that NULL is bad per se, just that it's good practice to question its merits in the design phase.


Post #992864
Posted Friday, September 24, 2010 8:59 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, November 20, 2014 8:13 AM
Points: 2,917, Visits: 2,536

Robert Frasca
The uniqueifier is created when a non-unique clustered index is created. This makes the clustered index unique. Did you actually define that clustered index as the primary key, i.e. with the primary key constraint?


I think that you are right about the 2 discussions

I took an existing table and set a field to be a PK, making sure that it was not unique.

I treid to create a publication before and after.

Just shows that there is ALWAYS something new to learn, and I think that we do daily


Steve Jimmo
Sr DBA
“If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan
Post #992867
Posted Friday, September 24, 2010 9:18 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 12:30 PM
Points: 554, Visits: 672
Just fyi, I would avoid using scalar functions in WHERE clauses whether they are builtin like ISNULL or user defined like yours if possible. It creates the RBAR effect as every single row must be processed through the function as part of the filtering process. I just finished removing scalar function references from a stored proc and saw execution drop from 40 minutes to six minutes just with that change alone. The problem is magnified as the number of rows processed by the query grows. The proc had other issues as well but my client wanted a metric on just that one scenario. (I guess they didn't believe me...chuckle.) Note, don't confuse the ISNULL builtin function with the T-SQL "IS NULL" or "IS NOT NULL" operators.

MHilsher (9/24/2010)
Rune,

I've got some similar situations in a database that I'm working with and found that the alternative seems to be just as bad. We've created functions to get and set min and max dates and usually will default a date to the f_MinDate() just so we don't have NULLs in the database. But now you still have to check if the value is "MinDate" and if so leave it out of your comparison.

That usually has me writting code that looks like:
WHERE ( Table.Date = f_MinDate() OR Table.Date BETWEEN @StartDate AND @EndDate )

Not sure if that is much better? And it gets even uglier if you allow the start and end dates of the period in question to be allowed to be null. Don't really have an answer to the issue, seems like you're damned if you do and damned if you don't.


Rune Bivrin (9/24/2010)
jeff.mason (9/24/2010)
As far as NULLs go, I have to say NULL is a good thing for some DATE columns. If you have, say, a StartDate and an EndDate column, you know the StartDate when the row is created but you may not know the EndDate. If you insist on having no NULLs, then you have to make up some arbitrary date (usually 1/1/1900) to allow a row to be created. But then that row works on date logic for functions, et al. Simply letting a Date field be NULL in those cases where you lack knowledge makes life so much easier.

Absolutely. Unless, of course, you find yourself writing a lot of code like
WHERE getdate() BETWEEN ISNULL(StartDate, '1900-01-01') AND ISNULL(EndDate, '9999-12-31')

in which case the NULLs will quite possibly kill your performance. I have worked quite a bit with industrial insurance where there are periods on everything, and that gets old very quickly...

Unfortunately SQL Server lacks stuff like DATE.MinValue constants that would be a good compromise.

Not that I'm arguing that NULL is bad per se, just that it's good practice to question its merits in the design phase.




"Beliefs" get in the way of learning.
Post #992889
« Prev Topic | Next Topic »

Add to briefcase «««34567»»»

Permissions Expand / Collapse