I'm asking for guidance on how to best discover PK-FK relationships

  • Last night I spent 45 minutes composing a complicated question here. When I finished I started the review, and then Chrome, crashed. Thanks, Chrome. This isn't the only time Chrome has instantly died, so I'm going to compose this post with lots of intermediate postings, to save it my work along the way.

    I've been in my current position for several years. One of the first projects I was put on was maintaining a really old ASP.NET VB.NET project started so long ago that I've no idea when it was started as it wasn't put into any version control system (VCS) prior to my taking it over. Anyway, the app had MANY non-standard software design issues, which I won't go into. But it also had lots of database non-standard issues as well. Looking at comments written by the developers back then, none of which are still here, it reveals that the original data store was XML files. I don't know how many intermittent steps they took, but it now resides in a SQL 2008 database. I'm guessing that either the original developers didn't know about primary keys and foreign keys, or because the original data store was XML which doesn't have PKs/FKs, none of the tables had any key of any sort. Years ago I added primary keys to the tables, based upon what looked like it made sense and often because that column was the only column that wasn't nullable.

    That's the way things stood for years, until recently when I finally was granted the right to rewrite this app, making it simpler than it is, so that it can be better maintained. And I'll be using entity framework (yes, I know, please don't throw stones at me), which works better if PKs and FKs are defined. But my problem is that often there's the PK I've identified doesn't have a corresponding FK named the same thing in another table. I'll illustrate by listing the table schema for three tables. First, is the main data table, named timetrack.

    CREATE TABLE [dbo].[timetrack] (
    [id] INT IDENTITY (1, 1) NOT NULL,
    [userid] CHAR (15) NULL,
    [taskid] INT NULL,
    [dt] DATETIME NULL,
    [hour] DECIMAL (10, 2) NULL,
    [jobid] INT NULL,
    [clinicid] INT NULL,
    [notes] NVARCHAR (100) NULL,
    CONSTRAINT [PK_New_timetrack] PRIMARY KEY CLUSTERED ([id] ASC) WITH (FILLFACTOR = 100)
    );

    A few things of note. The id column wasn't originally an IDENTITY column. I'm taking this from the test database, where I made some additional developments, such as adding IDENTITY to the id column. What the original developers did is have the ASP.NET web app increment the id column before saving a new record to timetrack, and just hope for the best when saving data that there will be no collisions.

    Next is an easy example, where a lookup table named r_task is used to link in tasks into timetrack. Here's what it looks like:

    CREATE TABLE [dbo].[r_task] (
    [taskid] INT NOT NULL,
    [desc] NVARCHAR (100) NULL,
    [classid] INT NULL,
    [active] BIT NULL,
    CONSTRAINT [PK_New_r_task] PRIMARY KEY CLUSTERED ([taskid] ASC) WITH (FILLFACTOR = 100)
    );

    It's primary key, taskid, has a corresponding column named the same thing in timetrack. So, this is easy-peasy. Now onto a more complicated issue which illustrates my problem. There's another table, which I think is related to the userid column in timetrack table, but this other table doesn't have a column named userid. Here is it's schema:

    CREATE TABLE [dbo].[PHD_USER] (
    [phd_user_id] VARCHAR (15) NOT NULL,
    [phd_user_name] NVARCHAR (40) NULL,
    [staff_id] INT NULL,
    [phd_user_status] VARCHAR (40) NULL,
    [phd_user_password] NVARCHAR (128) NULL,
    [phd_user_telephone_nr] CHAR (10) NULL,
    [phd_user_fax_nr] CHAR (10) NULL,
    [create_userid] VARCHAR (15) NULL,
    [create_datetime] DATETIME CONSTRAINT [DF_New_PHD_USER_create_datetime] DEFAULT (getdate()) NULL,
    [modify_userid] VARCHAR (15) NULL,
    [modify_datetime] DATETIME NULL,
    [password_last_changed_dt] DATETIME NULL,
    [phd_win_userid] VARCHAR (25) NULL,
    [phd_employee_id] CHAR (10) NULL,
    [PositionNumber] VARCHAR (8) NULL,
    CONSTRAINT [PK_New_PHD_USER] PRIMARY KEY CLUSTERED ([phd_user_id] ASC) WITH (FILLFACTOR = 100),
    CONSTRAINT [CK_PositionNum] CHECK ([PositionNumber] like '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]')
    );

    I think the phd_user_id column here, that is supposed to connect to userid in timetrack. And you'll notice another non-standard thing the original developers did. Some of the table names in this database are all in caps. The rest of all in lower case, because you know, why be consistent? I'm guessing that more than one person has worked in the database either at the same time or at a different time, and didn't bother to follow the convention set up by the first DBA. Anyway, phd_user_id is a VARCHAR(15) data type, whereas userid is a CHAR(15) data type. Well, at least they're the same length. BTW, I put in that constraint on the PositionNumber column, because that's what has to go into that column and it stopped people entering garbage. Also note that there's a column named staff_id in PHD_USER, but that it is nullable. Some data is there, some are null. I've no idea what the original developers/DBAs were doing; there's no documentation for this app.

    By now I'm sure you see my problem. Some columns in a lookup table have a column named the same as another column in a database. But not always. Perhaps if I had a tool to use, like TOAD, I could better figure out what column in one table goes to another table in another table, but I don't have that luxury. What strategy(ies) should I follow, when I try to introduce foreign keys into this database? And then how do I migrate the data out of the old database into a new database which has both PKs and FKs?

    • This topic was modified 3 weeks, 5 days ago by  Rod at work.
    • This topic was modified 3 weeks, 5 days ago by  Rod at work.
    • This topic was modified 3 weeks, 5 days ago by  Rod at work.
    • This topic was modified 3 weeks, 5 days ago by  Rod at work.
    • This topic was modified 3 weeks, 5 days ago by  Rod at work.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Check sys.foreign_keys ( did not read the full Q )

    Indeed, some tools can suggest logical relationships. Mostly based on column name and data type.

    However, that must still be double checked with your actual db model.

    If the relational model of your db does not contain the actual relationship, you are indeed paying big time with regards to performance. ( check "Yes, Foreign Keys Help Performance" )

    Such databases will also need more time for any dev/dba to understand. Due to the lack of relationship data. ( just read the applications does not cut it ! )

    Of course, order of insert is also mandatory when using foreign keys. The parent key value must exist in the parent table before it can be used in the child table.

    Keep in mind to always create your FK using the "with check check" clause !

    ( using "with check" will indeed create the foreign key constraint, but will not check it at creation time !! )

     

     

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Thank you, Johan, I was not aware of the "with check check" clause! At this point I was going to do exactly what you recommend not doing exclusively; try to deduce the relationships by following the spaghetti code.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Rod at work wrote:

    Last night I spent 45 minutes composing a complicated question here. When I finished I started the review, and then Chrome, crashed. Thanks, Chrome. This isn't the only time Chrome has instantly died, so I'm going to compose this post with lots of intermediate postings, to save it my work along the way.

    I learned my lesson when it comes to long posts long ago... I write the post in Word or Notepad and then post when I'm done writing.  If I need to add, I regularly do a copy'n'paste from the editor window back to Word or Notepad.

    Shifiting gears to a comment from your post above...

    Rod at work wrote:

    A few things of note. The id column wasn't originally an IDENTITY column. I'm taking this from the test database, where I made some additional developments, such as adding IDENTITY to the id column.

    I haven't liked IDENTITY columns for a long time now... especially since they had the fault of sorting the entire table in TempDB if you try to do a high speed, minimally logged copy of the table and have to use SET IDENTITY_INSERT (I haven't tested that issue since SQL Server 2016 but I've also not seen a report of it being fixed).  Instead, I recommend using a SEQUENCE, which became available in 2012, IIRC.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm not familiar with SEQUENCE, Jeff, I'll have to look into it. Thanks.

    Kindest Regards, Rod Connect with me on LinkedIn.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply