Find Common Link Between Two Tables in Database without Key Constraints

  • Does anyone know of a tool, script, procedure, or otherwise that will solve this problem?

    CREATE TABLE [Table 1] (

    Id varchar,

    TableTwoId varchar,

    OtherData varchar )

    CREATE TABLE [Table 2] (

    Id varchar,

    MoreData varchar )

    What is the link between these two tables?

    I have databases that have zero keys defined, no foreign key constraints, no unique value constraints. I cannot assume that an Identity column is the Id. The end game is be able to output that [Table 1].[TableTwoId] = [Table 2].[Id] but cannot assume that all linkage in the database will be as simple as saying "if the field name contains a table name + Id then it is the Id in that table."

    Currently have written a script that will cycle through all columns and start identifying keys in singular tables based on the distinctness of the values in each column (i.e. Column1 is 99% distinct so is the unique value). It will also combine columns if the linkage is the combination of one or more columns from Table 1 = the same number of columns in Table 2. This takes a long time, and it somewhat unreliable as IDENTITY columns may seem to relate to one another when they don't.

    Any help would be appreciated.

  • JimK 28184 (7/16/2015)


    Does anyone know of a tool, script, procedure, or otherwise that will solve this problem?

    CREATE TABLE [Table 1] (

    Id varchar,

    TableTwoId varchar,

    OtherData varchar )

    CREATE TABLE [Table 2] (

    Id varchar,

    MoreData varchar )

    What is the link between these two tables?

    I have databases that have zero keys defined, no foreign key constraints, no unique value constraints. I cannot assume that an Identity column is the Id. The end game is be able to output that [Table 1].[TableTwoId] = [Table 2].[Id] but cannot assume that all linkage in the database will be as simple as saying "if the field name contains a table name + Id then it is the Id in that table."

    Currently have written a script that will cycle through all columns and start identifying keys in singular tables based on the distinctness of the values in each column (i.e. Column1 is 99% distinct so is the unique value). It will also combine columns if the linkage is the combination of one or more columns from Table 1 = the same number of columns in Table 2. This takes a long time, and it somewhat unreliable as IDENTITY columns may seem to relate to one another when they don't.

    Any help would be appreciated.

    I don't think you're going to get much help on this one.

    It looks like you correctly identified the problem here. You're correct, you can't assume that if looks like a match, then it is a match.

    It might help if you could get you hands on queries that are used. Look at stored procedures if there are any. Look at the joins in those procedures. They may shed some light on how tables are connected.

    Find out who is using the Database(s). See if you can get some help from them, again by looking at the queries.

    Depending on the number of objects in the database, this might be easy to figure out.

    Good luck.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Thanks, Alvin. It is a recurring problem, so we try to get as much info from those who know up front, but a lot of times nobody can tell us the links. I had heard programs like Talend could identify keys between tables, but who I learned that from thought they used the column names and pre-defined constraints to make that judgement.

    Essentially, I'm getting tab delimited text files and making databases out of them, so it's the bare minimum of information.

  • JimK 28184 (7/16/2015)


    Thanks, Alvin. It is a recurring problem, so we try to get as much info from those who know up front, but a lot of times nobody can tell us the links. I had heard programs like Talend could identify keys between tables, but who I learned that from thought they used the column names and pre-defined constraints to make that judgement.

    Essentially, I'm getting tab delimited text files and making databases out of them, so it's the bare minimum of information.

    i had an app like this form a third party, where they obfuscated table names and eliminated foreign keys, thinking that that table structure and joins were proprietary information or something.

    i was going to say, if this was a database with zero foreign keys, I've done this by pulling all the execution plans in cache, and parsing the xml for any join criteria that came from the application itself from existing queries;

    the join critieria in the xml could help you with mapping,but if all you are receiving is a collection of flat files, i think you have to go to the source and ask for mapping information.

    does your source have any documentation related to the relationships of the flat files you receive?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • A lot of times no, we get as much info as we can. Sometimes we can get a schema or data dictionary which can be read and easily turned in to keys.

    Many times we might as well have found a set of floppies in a tomb and are trying to figure out what it all means.

  • well the est guess method based on column names might be a starting point, like you mentioned.

    how about this:

    if can you assume in your data sources that the first column, named *ID, and any other columns named *ID are potential Foreign keys.

    , than this code can generate some missing and implied foreign keys as a best guess.

    if you alter the tables and they fail, then it's because the data doesn't support the FK assumption;

    if it DOES, then you are closer to defining all FK's and relationships.

    WITH CurrFKS

    AS (SELECT Object_name(constid) AS constraintname,

    OBJECT_SCHEMA_NAME(rkeyid) AS refschema,

    Object_name(rkeyid) AS reftable,

    COL_NAME(rkeyid, rkey) AS refcolumn,

    OBJECT_SCHEMA_NAME(rkeyid) AS fkschema,

    Object_name(fkeyid) AS fktable,

    COL_NAME(fkeyid, fkey) AS fkcolumn,

    ' ALTER TABLE ' + Object_name(fkeyid)

    + ' ADD CONSTRAINT [PLACEHOLDER]'

    + ' FOREIGN KEY (' + COL_NAME(fkeyid, fkey)

    + ') REFERENCES ' + Object_name(rkeyid) + '('

    + COL_NAME(rkeyid, rkey) + ')' AS fksql

    FROM sysforeignkeys),

    MyCTE

    AS (SELECT Object_schema_name(colz.object_id) AS fkschemaname,

    Object_name(colz.object_id) AS fktablename,

    colz.NAME AS fkcolumnname,

    colz.column_id AS fkcolumn_id,

    x.*

    FROM sys.columns colz

    INNER JOIN (SELECT object_id,

    Object_schema_name(object_id) AS pkschemaname,

    Object_name(object_id) AS pktablename,

    NAME AS pkcolumnname,

    column_id AS pkcolumn_id

    FROM sys.columns

    WHERE RIGHT(NAME,2) = 'ID' -- based on idnetity? is_identity = 1

    AND LEFT(Object_name(object_id), 3) != 'sys') X

    ON colz.NAME = x.PKColumnName

    AND colz.object_id != x.object_id

    WHERE LEFT(Object_name(colz.object_id), 3) != 'sys'

    AND colz.is_identity = 0

    AND colz.object_id != x.object_id

    AND Object_schema_name(colz.object_id) = pkschemaname

    AND RIGHT(colz.NAME,2)= 'ID') SELECT ' ALTER TABLE ' + FKSchemaName + '.'

    + FKTableName + ' ADD CONSTRAINT ['

    + FKTableName + '_' + FKColumnName + ']'

    + ' FOREIGN KEY (' + FKColumnName

    + ') REFERENCES ' + pkschemaname + '.' + PKTableName + '('

    + PKColumnName + ')' AS fksql

    FROM MYCTE

    WHERE LEFT(PKTableName, 4) NOT IN( 'test', 'zed_', 'z_ME' )

    AND FKTableName NOT IN(SELECT NAME

    FROM sys.views)

    EXCEPT

    SELECT FKSQL

    FROM CurrFKS

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I will try to integrate that into what I have already to do a name check first. We do get some that would be like Id in one table and then OrderId in the other, that may match. A lot of times it'd be Id in one table and then Giraffe in the other (wild example, but really that the names mean nothing to one another).

    So the current logic is:

    If names are close, or contain any name like the other table name, then it may be a match. If not, a singular unique column in the table should be the Key, if not, check the next most distinct column in conjunction with the first. Continue down the line until a unique value is found (only adding in column if it makes the previous try MORE distinct). Then check if those fields match up to any fields in the other table by data. Report if there is 90% or more certainty that it is a match.

  • JimK 28184 (7/17/2015)


    I will try to integrate that into what I have already to do a name check first. We do get some that would be like Id in one table and then OrderId in the other, that may match. A lot of times it'd be Id in one table and then Giraffe in the other (wild example, but really that the names mean nothing to one another).

    So the current logic is:

    If names are close, or contain any name like the other table name, then it may be a match. If not, a singular unique column in the table should be the Key, if not, check the next most distinct column in conjunction with the first. Continue down the line until a unique value is found (only adding in column if it makes the previous try MORE distinct). Then check if those fields match up to any fields in the other table by data. Report if there is 90% or more certainty that it is a match.

    Assuming you were to find guesses, then you could "test" your guess by checking whether the "TableTwoID" column follows the requirements of a "foreign key":

    - that the data type is the same or compatible with the "ID" you think it matches up to

    - that the values in "TableTwoID" all exist in table two's ID column.

    You might need to make caveats around NULLs or some default value (e.g. 0).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply