SQLServerCentral Article

A Simple Recursion Finds Joins

,

There are many production databases that in practice follow a set of rules such as these:

  • All tables must be created with an integer column named '%ID' to be used as the primary key for the table.
  • Wherever the name of such an '%ID' column is used in another table, it must have a logical foreign key relation to the same-named primary key column, but do not create the physical foreign keys in the database.
  • The tables used by the central applications must be in third normal form, but It's OK to create de-normalized tables if needed.

Recently, I was tasked with removing all data for a particular enterprise from all versions of a database.  Some backup copies of the database could have tables that no longer exist in the current version.  I discovered that the above rules were implicit in this database, so I wrote a simple recursion to find join paths to the "enterprise ID" in whatever tables are found in a version of the database.

This is a simplified model of tables in the database:

IF OBJECT_ID('ksl.enterprise') IS NOT NULL
    DROP TABLE ksl.enterprise;
IF OBJECT_ID('ksl.company') IS NOT NULL
    DROP TABLE ksl.company;
IF OBJECT_ID('ksl.division') IS NOT NULL
    DROP TABLE ksl.division;
IF OBJECT_ID('ksl.department') IS NOT NULL
    DROP TABLE ksl.department;
IF OBJECT_ID('ksl.oneOff') IS NOT NULL
    DROP TABLE ksl.oneOff;
IF OBJECT_ID('ksl.oneOffChild') IS NOT NULL
    DROP TABLE ksl.oneOffChild;
IF OBJECT_ID('ksl.denormalizedCollection') IS NOT NULL
    DROP TABLE ksl.denormalizedCollection;
CREATE TABLE ksl.enterprise
    (
      entID INT PRIMARY KEY
    , attributes VARCHAR(20)
    );
CREATE TABLE ksl.company
    (
      companyID INT PRIMARY KEY
    , entID INT NOT NULL
    , attributes VARCHAR(20)
    );
CREATE TABLE ksl.division
    (
      divID INT PRIMARY KEY
    , companyID INT NOT NULL
    , attributes VARCHAR(20)
    );
CREATE TABLE ksl.department
    (
      deptID INT PRIMARY KEY
    , divID INT NOT NULL
    , attributes VARCHAR(20)
    );
CREATE TABLE ksl.oneOff
    (
      oneOffID INT PRIMARY KEY
    , companyID INT NOT NULL
    , attributes VARCHAR(20)
    );
CREATE TABLE ksl.oneOffChild
    (
      oneOffChildID INT PRIMARY KEY
    , oneOffID INT NOT NULL
    , attributes VARCHAR(20)
    );
CREATE TABLE ksl.denormalizedCollection
    (
      denormCollId INT PRIMARY KEY
    , divID INT NOT NULL
    , entID INT NOT NULL
    , attributes VARCHAR(20)
    );

The "mother of all tables" is enterprise, which has entID as its primary key.  The enterprise has many companies, represented in table company, and each company has many divisions, and each division has many departments.  The task is to delete all data related to a particular entID value, the value that that identifies the "gone for good" enterprise.  This is a database that has grown to hundreds of tables, their various structures represented by tables such as: oneOff, oneOffChild, and denormalizedCollection.

Before we proceed to find all join paths to enterprise.entID, we look at the SQL Server version and collation, so we won't be surprised by the SQL Server system catalog or case sensitivity.  For my task, I used this code and received these results:

select @@version,serverproperty('collation')

Note that I created the model of the database in a non-dbo schema.  The recursion to find joins requires a table that I create in the non-dbo schema, ksl, to keep it from being considered as another of the hundreds of tables that have to be examined for data related to the "gone for good" entID.    Here is T-SQL used to manage schemas:

/*
create schema ksl authorization dbo
select schema_id from sys.schemas where name='ksl'
schema_id
---------
6
*/

The setup for the simple recursion is a "select into" from the system catalog to create a table in the ksl schema.  The query selects the set of tablename.columnname candidates for finding joins between tables, that is, all columns named '%ID' in the relevant schema, with their table names.  It is the naming convention that is followed for key columns that makes the join path to entID useful.  The query creates an integer column named "level" and assigns the value 99 to all records selected for that column.  This default value for the set of tablename.columnname is intended to mean "there is no join path from this column to an entID column."

The opposite of "there is no join path" is "this is a table with the entID column."  The value for "top level of the hierarchy" is 0 (zero).  The name of the ID column at the top of the enterprise hierarchy is storeded in the variable @topID. An update statement updates level=99 to level=0 for every table.column that is a column named entID or is a column in a table that has a column named entID.  (The first condition is a subset of the second condition, but is stated explicitly to emphasize that identifying all join paths from all tables to entID is the goal.)

The recursion to find joins is simple, performed by a "while loop" that increments the value that is written to the column named level.   At start, the value of the @level variable is 0 (zero), representing the subset of table.column values that are tables that include the entID column.  From all the table.column values that are level=99, mark those that join to the ones that are level=0 as level=1.  Increment the @level value by 1; from all the table.column values that are level=99, mark those that join to the ones that are level=1 as level=2.  Keep looping, traversing the join path,  until no joins are found. Here is the code:

  IF OBJECT_ID('ksl.findjoins') IS NOT NULL
      DROP TABLE ksl.findjoins
  DECLARE @topID SYSNAME
  SET @topID = 'entId'
  SELECT o.name AS tblname
        , c.name AS colname
        , 99 AS level
      INTO ksl.findjoins
      FROM sys.objects o
          JOIN sys.columns c
          ON o.object_id = c.object_id
             AND c.name LIKE '%id'
             AND o.schema_id = 6
  UPDATE ksl.findjoins
      SET level = 0
      WHERE colname = @topID
          OR tblname IN ( SELECT tblname
                              FROM ksl.findjoins
                              WHERE colname = @topID )
  DECLARE @level TINYINT
  SET @level = 0
  DECLARE @rowcount INT
  WHILE 1 = 1
      BEGIN
          UPDATE ksl.findjoins
              SET level = @level + 1
              WHERE ( colname IN ( SELECT colname
                                      FROM ksl.findjoins
                                      WHERE level = @level )
                      OR tblname IN ( SELECT tblname
                                          FROM ksl.findjoins
                                          WHERE level = @level )
                    )
                  AND level = 99
          SET @rowcount = @@rowcount
          IF @rowcount = 0
              BREAK
          SET @level = @level + 1
      END

This table, ksl.findjoins, holds the all data we need to write all the necessary delete statements -- "delete @tablename where @fkColumn in (select @pkColumn  from @refTablename where entID = 'goneForGood')".  

  SELECT *
      FROM ksl.findjoins
      ORDER BY level
        , colname
        , tblname

This gives us:

...

Any columns named '%ID' that have no join path to entID will remain level = 99.  In this simplified database there are only seven tables , all of which do have a join to entID,  the join found by looping to level <= 4.

The join paths are made evident with a view on table, findjoins, augmented by the SQL Server catalog to identify columns that are the primary key columns.  The minimum level value for each table in findjoins  is the shortest join path to entID.  There is one row in the view for each table, with the "min level" and the column to use to join and the indicator 'PK' if that column is a primary key column.

IF OBJECT_ID('ksl.vFindJoinsMinLevel', 'V') IS NOT NULL
    DROP VIEW ksl.vFindJoinsMinLevel
go
CREATE VIEW ksl.vFindJoinsMinLevel
AS
    SELECT t.tblname
          , t.colname
          , x.minlevel
          , CASE WHEN sc.name IS NULL THEN ''
                 ELSE 'PK'
            END AS isPK
        FROM ksl.findjoins t
            JOIN ( SELECT tblname
                      , MIN(level) AS minlevel
                    FROM ksl.findJoins
                    GROUP BY tblname
                 ) x
            ON t.tblname = x.tblname
               AND t.level = x.minlevel
            JOIN sys.objects so
            ON t.tblname = so.name
               AND so.schema_id = 6
               AND so.type = 'U'
            LEFT JOIN sys.indexes si
            ON si.object_id = so.object_id
            LEFT JOIN sys.index_columns sic
            ON si.object_id = sic.object_id
               AND si.index_id = sic.index_id
            LEFT JOIN sys.columns sc
            ON sc.column_id = sic.column_id
               AND sc.object_id = so.object_id
               AND sc.name = t.colname
               AND si.is_primary_key = 1
This query on the view gives us the shortest join paths:
SELECT *
    FROM ksl.vFindJoinsMinLevel
    ORDER BY 3
      , 2
      , 1

This query displays a join path to entID for a single table that is marked by the recursion as "level  = 3":

DECLARE @level3Tablename SYSNAME
SET @level3Tablename = 'oneOffChild'
SELECT l0.tblname AS l0tblname
      , l0.colname AS l0colname
      , l1.tblname AS l1tblname
      , l1.colname AS l1colname
      , l2.tblname AS l2tblname
      , l2.colname AS l2colname
      , l3.tblname AS l3tblname
      , l3.colname AS l3colname
    FROM ksl.findjoins l0
        JOIN ksl.findjoins l1
        ON l0.colname = l1.colname
        JOIN ksl.findjoins l2
        ON l1.tblname = l2.tblname
        JOIN ksl.findjoins l3
        ON l2.colname = l3.colname
    WHERE l3.tblname = @level3Tablename
        AND l3.level = 3
        AND l2.level = 2
        AND l1.level = 1
        AND l0.level = 0
    ORDER BY 8
      , 6
      , 4
      , 1
This results in:
In conclusion, a simple recursion on table and column names from sys.objects joined to sys.columns, with the system catalog data saved to a user table and queried through a view, reports that table oneOffChild joins to table oneOff on column oneOffID in each table, and table oneOff joins to table company on column companyID in each table.  To select or delete all records in oneOffChild that belong to the 'goneForGood' enterprise, the above query result tells us to write
SELECT *
    FROM oneOffChild
    WHERE oneOffId IN (
        SELECT oneOffId
            FROM oneOff
            WHERE companyID IN ( SELECT companyID
                                    FROM company
                                    WHERE entID = 12 --'goneForGood'
   ) )

or

DELETE ooc
    FROM oneOffChild ooc
        JOIN oneOff oo
        ON ooc.oneOffId = oo.oneOffId
        JOIN company co
        ON oo.companyID = co.companyID
    WHERE co.entID = 12 --'goneForGood'

Rate

3 (8)

You rated this post out of 5. Change rating

Share

Share

Rate

3 (8)

You rated this post out of 5. Change rating