July 16, 2015 at 2:26 pm
Does anyone know of a tool, script, procedure, or otherwise that will solve this problem?
CREATE TABLE [Table 1] (
Id varchar,
TableTwoId varchar,
OtherData varchar )
CREATE TABLE [Table 2] (
Id varchar,
MoreData varchar )
What is the link between these two tables?
I have databases that have zero keys defined, no foreign key constraints, no unique value constraints. I cannot assume that an Identity column is the Id. The end game is be able to output that [Table 1].[TableTwoId] = [Table 2].[Id] but cannot assume that all linkage in the database will be as simple as saying "if the field name contains a table name + Id then it is the Id in that table."
Currently have written a script that will cycle through all columns and start identifying keys in singular tables based on the distinctness of the values in each column (i.e. Column1 is 99% distinct so is the unique value). It will also combine columns if the linkage is the combination of one or more columns from Table 1 = the same number of columns in Table 2. This takes a long time, and it somewhat unreliable as IDENTITY columns may seem to relate to one another when they don't.
Any help would be appreciated.
July 16, 2015 at 2:45 pm
JimK 28184 (7/16/2015)
Does anyone know of a tool, script, procedure, or otherwise that will solve this problem?CREATE TABLE [Table 1] (
Id varchar,
TableTwoId varchar,
OtherData varchar )
CREATE TABLE [Table 2] (
Id varchar,
MoreData varchar )
What is the link between these two tables?
I have databases that have zero keys defined, no foreign key constraints, no unique value constraints. I cannot assume that an Identity column is the Id. The end game is be able to output that [Table 1].[TableTwoId] = [Table 2].[Id] but cannot assume that all linkage in the database will be as simple as saying "if the field name contains a table name + Id then it is the Id in that table."
Currently have written a script that will cycle through all columns and start identifying keys in singular tables based on the distinctness of the values in each column (i.e. Column1 is 99% distinct so is the unique value). It will also combine columns if the linkage is the combination of one or more columns from Table 1 = the same number of columns in Table 2. This takes a long time, and it somewhat unreliable as IDENTITY columns may seem to relate to one another when they don't.
Any help would be appreciated.
I don't think you're going to get much help on this one.
It looks like you correctly identified the problem here. You're correct, you can't assume that if looks like a match, then it is a match.
It might help if you could get you hands on queries that are used. Look at stored procedures if there are any. Look at the joins in those procedures. They may shed some light on how tables are connected.
Find out who is using the Database(s). See if you can get some help from them, again by looking at the queries.
Depending on the number of objects in the database, this might be easy to figure out.
Good luck.
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
July 16, 2015 at 3:02 pm
Thanks, Alvin. It is a recurring problem, so we try to get as much info from those who know up front, but a lot of times nobody can tell us the links. I had heard programs like Talend could identify keys between tables, but who I learned that from thought they used the column names and pre-defined constraints to make that judgement.
Essentially, I'm getting tab delimited text files and making databases out of them, so it's the bare minimum of information.
July 17, 2015 at 5:56 am
JimK 28184 (7/16/2015)
Thanks, Alvin. It is a recurring problem, so we try to get as much info from those who know up front, but a lot of times nobody can tell us the links. I had heard programs like Talend could identify keys between tables, but who I learned that from thought they used the column names and pre-defined constraints to make that judgement.Essentially, I'm getting tab delimited text files and making databases out of them, so it's the bare minimum of information.
i had an app like this form a third party, where they obfuscated table names and eliminated foreign keys, thinking that that table structure and joins were proprietary information or something.
i was going to say, if this was a database with zero foreign keys, I've done this by pulling all the execution plans in cache, and parsing the xml for any join criteria that came from the application itself from existing queries;
the join critieria in the xml could help you with mapping,but if all you are receiving is a collection of flat files, i think you have to go to the source and ask for mapping information.
does your source have any documentation related to the relationships of the flat files you receive?
Lowell
July 17, 2015 at 6:59 am
A lot of times no, we get as much info as we can. Sometimes we can get a schema or data dictionary which can be read and easily turned in to keys.
Many times we might as well have found a set of floppies in a tomb and are trying to figure out what it all means.
July 17, 2015 at 8:36 am
well the est guess method based on column names might be a starting point, like you mentioned.
how about this:
if can you assume in your data sources that the first column, named *ID, and any other columns named *ID are potential Foreign keys.
, than this code can generate some missing and implied foreign keys as a best guess.
if you alter the tables and they fail, then it's because the data doesn't support the FK assumption;
if it DOES, then you are closer to defining all FK's and relationships.
WITH CurrFKS
AS (SELECT Object_name(constid) AS constraintname,
OBJECT_SCHEMA_NAME(rkeyid) AS refschema,
Object_name(rkeyid) AS reftable,
COL_NAME(rkeyid, rkey) AS refcolumn,
OBJECT_SCHEMA_NAME(rkeyid) AS fkschema,
Object_name(fkeyid) AS fktable,
COL_NAME(fkeyid, fkey) AS fkcolumn,
' ALTER TABLE ' + Object_name(fkeyid)
+ ' ADD CONSTRAINT [PLACEHOLDER]'
+ ' FOREIGN KEY (' + COL_NAME(fkeyid, fkey)
+ ') REFERENCES ' + Object_name(rkeyid) + '('
+ COL_NAME(rkeyid, rkey) + ')' AS fksql
FROM sysforeignkeys),
MyCTE
AS (SELECT Object_schema_name(colz.object_id) AS fkschemaname,
Object_name(colz.object_id) AS fktablename,
colz.NAME AS fkcolumnname,
colz.column_id AS fkcolumn_id,
x.*
FROM sys.columns colz
INNER JOIN (SELECT object_id,
Object_schema_name(object_id) AS pkschemaname,
Object_name(object_id) AS pktablename,
NAME AS pkcolumnname,
column_id AS pkcolumn_id
FROM sys.columns
WHERE RIGHT(NAME,2) = 'ID' -- based on idnetity? is_identity = 1
AND LEFT(Object_name(object_id), 3) != 'sys') X
ON colz.NAME = x.PKColumnName
AND colz.object_id != x.object_id
WHERE LEFT(Object_name(colz.object_id), 3) != 'sys'
AND colz.is_identity = 0
AND colz.object_id != x.object_id
AND Object_schema_name(colz.object_id) = pkschemaname
AND RIGHT(colz.NAME,2)= 'ID') SELECT ' ALTER TABLE ' + FKSchemaName + '.'
+ FKTableName + ' ADD CONSTRAINT ['
+ FKTableName + '_' + FKColumnName + ']'
+ ' FOREIGN KEY (' + FKColumnName
+ ') REFERENCES ' + pkschemaname + '.' + PKTableName + '('
+ PKColumnName + ')' AS fksql
FROM MYCTE
WHERE LEFT(PKTableName, 4) NOT IN( 'test', 'zed_', 'z_ME' )
AND FKTableName NOT IN(SELECT NAME
FROM sys.views)
EXCEPT
SELECT FKSQL
FROM CurrFKS
Lowell
July 17, 2015 at 9:29 am
I will try to integrate that into what I have already to do a name check first. We do get some that would be like Id in one table and then OrderId in the other, that may match. A lot of times it'd be Id in one table and then Giraffe in the other (wild example, but really that the names mean nothing to one another).
So the current logic is:
If names are close, or contain any name like the other table name, then it may be a match. If not, a singular unique column in the table should be the Key, if not, check the next most distinct column in conjunction with the first. Continue down the line until a unique value is found (only adding in column if it makes the previous try MORE distinct). Then check if those fields match up to any fields in the other table by data. Report if there is 90% or more certainty that it is a match.
July 17, 2015 at 11:09 am
JimK 28184 (7/17/2015)
I will try to integrate that into what I have already to do a name check first. We do get some that would be like Id in one table and then OrderId in the other, that may match. A lot of times it'd be Id in one table and then Giraffe in the other (wild example, but really that the names mean nothing to one another).So the current logic is:
If names are close, or contain any name like the other table name, then it may be a match. If not, a singular unique column in the table should be the Key, if not, check the next most distinct column in conjunction with the first. Continue down the line until a unique value is found (only adding in column if it makes the previous try MORE distinct). Then check if those fields match up to any fields in the other table by data. Report if there is 90% or more certainty that it is a match.
Assuming you were to find guesses, then you could "test" your guess by checking whether the "TableTwoID" column follows the requirements of a "foreign key":
- that the data type is the same or compatible with the "ID" you think it matches up to
- that the values in "TableTwoID" all exist in table two's ID column.
You might need to make caveats around NULLs or some default value (e.g. 0).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy