Query Identify Natural Primary Key Columns

  • Someone told me that they had a Script contains and Sub Query with DISTINCT keyword to identify the number of columns that it took for a unique key.

    Does anyone have anything like this?

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Not sure what you would like the script to do.

    All columns that contain unique values?

    All columns in UNIQUE indexes?

    Can you clarify please?

    -- Gianluca Sartori

  • Rather than perform several SELECT DISTINCT to identify what combination of columns it takes to define the natural candidate composite primary key on over a hunded tables I would like to do it in a single script.

    Does that make sense?

    I know someone that claims that he did it but I can't reach him.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (3/5/2012)


    Rather than perform several SELECT DISTINCT to identify what combination of columns it takes to define the natural candidate composite primary key on over a hunded tables I would like to do it in a single script.

    Does that make sense?

    I know someone that claims that he did it but I can't reach him.

    I guess I would do it using SELECT DISTINCT cross joining all table columns from sys.columns.

    I'm afraid I can't help you.

    -- Gianluca Sartori

  • strange way to sample column combinations like that to determine a primary key candidate.

    IMO you could guess for a unique key column composition, to make it primary key ( i.e. to build DRI on ) would IMHO need more data study. That's why this assignment should be done at design time.

    Are all columns ( to be examined declared ) not null ? If no, that may fairly restrict your examination.

    I think I would perform such investigation by just generating a select statement combined with a group-by based on the information_schema.columns view.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • While I heartedly agree with every word Johan wrote, let's see if this can get you started:

    DECLARE @results TABLE (

    TABLE_NAME sysname,

    TOTAL_COUNT int,

    COLUMN_NAME sysname,

    DISTINCT_COUNT int

    );

    INSERT @results

    EXEC sp_MsForEachTable '

    DECLARE @q nvarchar(max);

    DECLARE @c nvarchar(max);

    SET @q = (

    SELECT '', COUNT(DISTINCT '' + QUOTENAME(name) + '') AS '' + QUOTENAME(name) AS [text()]

    FROM sys.columns

    WHERE object_id = OBJECT_ID(''?'')

    FOR XML PATH('''')

    );

    SET @c = STUFF((

    SELECT '', '' + QUOTENAME(name) AS [text()]

    FROM sys.columns

    WHERE object_id = OBJECT_ID(''?'')

    FOR XML PATH('''')

    ),1,1,SPACE(0));

    SET @q = ''SELECT TABLE_NAME, TOTAL_COUNT, COLUMN_NAME, DISTINCT_COUNT

    FROM (

    SELECT ''''?'''' AS TABLE_NAME, COUNT(*) AS TOTAL_COUNT '' + @q + ''

    FROM ?

    ) AS src

    UNPIVOT ( DISTINCT_COUNT FOR COLUMN_NAME IN (''+ @c +'')) u'';

    PRINT @q;

    EXECUTE(@q);

    '

    SELECT *

    FROM @results

    ORDER BY TABLE_NAME, COLUMN_NAME;

    Columns with a DISTINCT COUNT < TOTAL_COUNT cannot be primary key candidates. This works for single column candidate keys and won't be useful for composite keys.

    -- Gianluca Sartori

  • based on information_schema.columns you could get started using ...

    WITH cteRecurse

    AS (

    SELECT TABLE_CATALOG

    , TABLE_SCHEMA

    , TABLE_NAME

    , CAST(QUOTENAME(COLUMN_NAME) AS NVARCHAR(MAX)) AS COLUMN_NAMEs

    , ORDINAL_POSITION AS nColumns

    FROM INFORMATION_SCHEMA.COLUMNS

    WHERE ORDINAL_POSITION = 1

    AND TABLE_CATALOG = 'YourDatabaseName'

    AND IS_NULLABLE = 'no'

    UNION ALL

    SELECT C.TABLE_CATALOG

    , C.TABLE_SCHEMA

    , C.TABLE_NAME

    , R.COLUMN_NAMEs + ',' + QUOTENAME(C.COLUMN_NAME)

    , C.ORDINAL_POSITION

    FROM INFORMATION_SCHEMA.COLUMNS C

    INNER JOIN cteRecurse R

    ON R.TABLE_CATALOG = C.TABLE_CATALOG

    and R.TABLE_SCHEMA = C.TABLE_SCHEMA

    and R.TABLE_NAME = C.TABLE_NAME

    and R.nColumns + 1 = C.ORDINAL_POSITION

    AND C.IS_NULLABLE = 'no'

    )

    Select 'Select ''' + TABLE_CATALOG + ''',''' + TABLE_SCHEMA + ''',''' + TABLE_NAME + ''',''' + Column_Names + ''', '

    + Column_Names + ', count(*) nRows from ' + QUOTENAME (TABLE_CATALOG) + '.' + QUOTENAME (TABLE_SCHEMA ) + '.' + QUOTENAME (TABLE_NAME)

    + ' group by ' + Column_Names + ';'

    from cteRecurse

    order by TABLE_CATALOG

    , TABLE_SCHEMA

    , TABLE_NAME

    , COLUMN_NAMEs

    , nColumns

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Thanks for the information.

    I did not mention that this data is from a legacy DB2 Database that is hosted outside of the company and soon to go away.

    None of the table have PK's or FK's defined.

    Thanks again.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (3/5/2012)


    Thanks for the information.

    I did not mention that this data is from a legacy DB2 Database that is hosted outside of the company and soon to go away.

    None of the table have PK's or FK's defined.

    Thanks again.

    They must have defined indexes for their objects, haven't they ?

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Indexes but no Primary Keys.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • If so, may be legacy DB2 database has some unique idexes?

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (3/6/2012)


    If so, may be legacy DB2 database has some unique idexes?

    In my experience, DB2 on z/OS is often used without indexes and has "logical views" to implement the index functionality.

    -- Gianluca Sartori

  • Gianluca Sartori (3/6/2012)


    Eugene Elutin (3/6/2012)


    If so, may be legacy DB2 database has some unique idexes?

    In my experience, DB2 on z/OS is often used without indexes and has "logical views" to implement the index functionality.

    In a previous post OP stated that there are indexes in the database...

    Anyway, instead of trying to identify PK from data I would suggest to go to the dump aproach of doing some business analyses and identify what legacy columns should be taken as a natural keys. I would also suggest to use artificial PKs.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Along Eugene's lines of code review, I've found that a very high percentage of table designs have the first column or the first few columns actually define a primary key candidate;

    does db2 have AutoIncrement/Identity? that's a god candidate if available.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Gianluca Sartori (3/6/2012)


    Eugene Elutin (3/6/2012)


    If so, may be legacy DB2 database has some unique idexes?

    In my experience, DB2 on z/OS is often used without indexes and has "logical views" to implement the index functionality.

    In that case you've seen a fast lane migration implementation to get rid of Z/OS IMS hierarchical database system.

    I've seen that too in companies that just wanted to migrate way back in the early days and were stupid enough not to take advantage of the RDBMS powers.

    And, yes, indeed, they are still struggling to migrate these objects to proper rdbms objects, pushing it forward time and again, and even maintaining the IMS structure as required.

    Of course they blame DB2 for bad performance whenever they experience it.

    Back to OP:

    Just like SQLServer, but implemented slightly different, a DB2 table can have a clustering index. ( if not assigned, it will use the first index created on the object ) and it can also have (non)unique indexes.

    AFAIK, but I'll have to double check, DRI with DB2 can only be declared based on PK indexes (where as with SQLServer you can also refer a Unique index)

    Best is to check again with the DB2 DBA.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply