Query Identify Natural Primary Key Columns

Question

Post reply

Query Identify Natural Primary Key Columns

Welsh Corgi

SSC Guru

Points: 116520
More actions
March 5, 2012 at 8:27 am

#254179

Someone told me that they had a Script contains and Sub Query with DISTINCT keyword to identify the number of columns that it took for a unique key.
Does anyone have anything like this?
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

spaghettidba SSC Guru Points: 105732 More actions · Answer 1

Not sure what you would like the script to do.

All columns that contain unique values?

All columns in UNIQUE indexes?

Can you clarify please?

-- Gianluca Sartori

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 2

Rather than perform several SELECT DISTINCT to identify what combination of columns it takes to define the natural candidate composite primary key on over a hunded tables I would like to do it in a single script.

Does that make sense?

I know someone that claims that he did it but I can't reach him.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

spaghettidba SSC Guru Points: 105732 More actions · Answer 3

Welsh Corgi (3/5/2012)
Rather than perform several SELECT DISTINCT to identify what combination of columns it takes to define the natural candidate composite primary key on over a hunded tables I would like to do it in a single script.
Does that make sense?
I know someone that claims that he did it but I can't reach him.

I guess I would do it using SELECT DISTINCT cross joining all table columns from sys.columns.

I'm afraid I can't help you.

-- Gianluca Sartori

Johan Bijnens SSC Guru Points: 135589 More actions · Answer 4

strange way to sample column combinations like that to determine a primary key candidate.

IMO you could guess for a unique key column composition, to make it primary key ( i.e. to build DRI on ) would IMHO need more data study. That's why this assignment should be done at design time.

Are all columns ( to be examined declared ) not null ? If no, that may fairly restrict your examination.

I think I would perform such investigation by just generating a select statement combined with a group-by based on the information_schema.columns view.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

spaghettidba SSC Guru Points: 105732 More actions · Answer 5

While I heartedly agree with every word Johan wrote, let's see if this can get you started:

DECLARE @results TABLE (

TABLE_NAME sysname,

TOTAL_COUNT int,

COLUMN_NAME sysname,

DISTINCT_COUNT int

);

INSERT @results

EXEC sp_MsForEachTable '

DECLARE @q nvarchar(max);

DECLARE @c nvarchar(max);

SET @q = (

SELECT '', COUNT(DISTINCT '' + QUOTENAME(name) + '') AS '' + QUOTENAME(name) AS [text()]

FROM sys.columns

WHERE object_id = OBJECT_ID(''?'')

FOR XML PATH('''')

);

SET @c = STUFF((

SELECT '', '' + QUOTENAME(name) AS [text()]

FROM sys.columns

WHERE object_id = OBJECT_ID(''?'')

FOR XML PATH('''')

),1,1,SPACE(0));

SET @q = ''SELECT TABLE_NAME, TOTAL_COUNT, COLUMN_NAME, DISTINCT_COUNT

FROM (

SELECT ''''?'''' AS TABLE_NAME, COUNT(*) AS TOTAL_COUNT '' + @q + ''

FROM ?

) AS src

UNPIVOT ( DISTINCT_COUNT FOR COLUMN_NAME IN (''+ @c +'')) u'';

PRINT @q;

EXECUTE(@q);

'

SELECT *

FROM @results

ORDER BY TABLE_NAME, COLUMN_NAME;

Columns with a DISTINCT COUNT < TOTAL_COUNT cannot be primary key candidates. This works for single column candidate keys and won't be useful for composite keys.

-- Gianluca Sartori

Johan Bijnens SSC Guru Points: 135589 More actions · Answer 6

based on information_schema.columns you could get started using ...

WITH cteRecurse

AS (

SELECT TABLE_CATALOG

, TABLE_SCHEMA

, TABLE_NAME

, CAST(QUOTENAME(COLUMN_NAME) AS NVARCHAR(MAX)) AS COLUMN_NAMEs

, ORDINAL_POSITION AS nColumns

FROM INFORMATION_SCHEMA.COLUMNS

WHERE ORDINAL_POSITION = 1

AND TABLE_CATALOG = 'YourDatabaseName'

AND IS_NULLABLE = 'no'

UNION ALL

SELECT C.TABLE_CATALOG

, C.TABLE_SCHEMA

, C.TABLE_NAME

, R.COLUMN_NAMEs + ',' + QUOTENAME(C.COLUMN_NAME)

, C.ORDINAL_POSITION

FROM INFORMATION_SCHEMA.COLUMNS C

INNER JOIN cteRecurse R

ON R.TABLE_CATALOG = C.TABLE_CATALOG

and R.TABLE_SCHEMA = C.TABLE_SCHEMA

and R.TABLE_NAME = C.TABLE_NAME

and R.nColumns + 1 = C.ORDINAL_POSITION

AND C.IS_NULLABLE = 'no'

)

Select 'Select ''' + TABLE_CATALOG + ''',''' + TABLE_SCHEMA + ''',''' + TABLE_NAME + ''',''' + Column_Names + ''', '

+ Column_Names + ', count(*) nRows from ' + QUOTENAME (TABLE_CATALOG) + '.' + QUOTENAME (TABLE_SCHEMA ) + '.' + QUOTENAME (TABLE_NAME)

+ ' group by ' + Column_Names + ';'

from cteRecurse

order by TABLE_CATALOG

, TABLE_SCHEMA

, TABLE_NAME

, COLUMN_NAMEs

, nColumns

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 7

Thanks for the information.

I did not mention that this data is from a legacy DB2 Database that is hosted outside of the company and soon to go away.

None of the table have PK's or FK's defined.

Thanks again.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Johan Bijnens SSC Guru Points: 135589 More actions · Answer 8

Welsh Corgi (3/5/2012)
Thanks for the information.
I did not mention that this data is from a legacy DB2 Database that is hosted outside of the company and soon to go away.
None of the table have PK's or FK's defined.
Thanks again.

They must have defined indexes for their objects, haven't they ?

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 9

Indexes but no Primary Keys.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 10

If so, may be legacy DB2 database has some unique idexes?

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

spaghettidba SSC Guru Points: 105732 More actions · Answer 11

Eugene Elutin (3/6/2012)
If so, may be legacy DB2 database has some unique idexes?

In my experience, DB2 on z/OS is often used without indexes and has "logical views" to implement the index functionality.

-- Gianluca Sartori

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 12

Gianluca Sartori (3/6/2012)
Eugene Elutin (3/6/2012)
If so, may be legacy DB2 database has some unique idexes?
In my experience, DB2 on z/OS is often used without indexes and has "logical views" to implement the index functionality.

In a previous post OP stated that there are indexes in the database...

Anyway, instead of trying to identify PK from data I would suggest to go to the dump aproach of doing some business analyses and identify what legacy columns should be taken as a natural keys. I would also suggest to use artificial PKs.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Lowell SSC Guru Points: 323738 More actions · Answer 13

Along Eugene's lines of code review, I've found that a very high percentage of table designs have the first column or the first few columns actually define a primary key candidate;

does db2 have AutoIncrement/Identity? that's a god candidate if available.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Johan Bijnens SSC Guru Points: 135589 More actions · Answer 14

Gianluca Sartori (3/6/2012)
Eugene Elutin (3/6/2012)
If so, may be legacy DB2 database has some unique idexes?
In my experience, DB2 on z/OS is often used without indexes and has "logical views" to implement the index functionality.

In that case you've seen a fast lane migration implementation to get rid of Z/OS IMS hierarchical database system.

I've seen that too in companies that just wanted to migrate way back in the early days and were stupid enough not to take advantage of the RDBMS powers.

And, yes, indeed, they are still struggling to migrate these objects to proper rdbms objects, pushing it forward time and again, and even maintaining the IMS structure as required.

Of course they blame DB2 for bad performance whenever they experience it.

Back to OP:

Just like SQLServer, but implemented slightly different, a DB2 table can have a clustering index. ( if not assigned, it will use the first index created on the object ) and it can also have (non)unique indexes.

AFAIK, but I'll have to double check, DRI with DB2 can only be declared based on PK indexes (where as with SQLServer you can also refer a Unique index)

Best is to check again with the DB2 DBA.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me