Finding hierarchy errors

  • Hi all,

    I've read some posts by Jeff Moden and Gsquared on Hierarchy structures (very helpfull, thank you) an delving into TallyTables on the way. Very enlightning.

    What I can't find, is a way to check for user input errors on these structures.

    Consider the following data structure:

    create table dbo.eqmas(eq_nmr VarChar(20), eq_parent VarChar(20), eq_desc VarChar(80))

    INSERT INTO dbo.eqmas

    SELECT 'OB01', Null, 'Object 1'

    UNION ALL

    SELECT 'OB02', 'OB1', 'Object 2'

    UNION ALL

    SELECT 'OB03', 'OB1', 'Object 1'

    UNION ALL

    SELECT 'OB04', 'OB2', 'Object 4'

    UNION ALL

    SELECT 'OB05', 'OB2', 'Object 5'

    UNION ALL

    SELECT 'OB06', 'OB10', 'Object 6'

    UNION ALL

    SELECT 'OB07', 'OB8', 'Object 8'

    UNION ALL

    SELECT 'OB08', 'OB9', 'Object 7'

    UNION ALL

    SELECT 'OB09', 'OB7', 'Object 7'

    As you can see, all is hunkydor up to OB6, this has a non existing parent. This one is easily find with a left outer self join.

    OB7 is child of OB8, is child of OB9 is child of OB7. This is a circular reference. This can be one to many deep. One level (self recurring) is easily identified with a eq_nmr = eq_parent select.

    Currently I'm traversing the whole tree with a cursor for each record in the table. As the table grows, so goes the performance because of the RBAR (brrrr :crazy: ).

    When I use the following code on a broken production set, I get 'Maximum recursion level reached'

    ;with HierarchyCTE (Lvl, eq_nmr, eq_parent, eq_desc) as

    (select 1, eq_nmr, eq_parent, eq_desc

    from dbo.eqmas

    union all

    select Lvl + 1, h2.eq_nmr, h2.eq_parent, h2.eq_desc

    from dbo.eqmas h2

    inner join HierarchyCTE

    on h2.eq_parent = HierarchyCTE.eq_nmr)

    select Lvl, eq_nmr, eq_parent, eq_desc

    from HierarchyCTE

    order by eq_nmr

    Can you suggest a better, faster, more efficient way to identify these errors, or is the cursor way the least bad way to do this.

    I'm on SQL2005 Enterprise

    Thanks,

    Cees Cappelle

    p.s.

    Here's my current errorchecking code. Feel free to bash.

    IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = object_id(N'mcmain.mcmsp_eqmascheck') AND OBJECTPROPERTY(id, N'IsProcedure') = 1)

    DROP PROCEDURE mcmain.mcmsp_eqmascheck

    GO

    CREATE PROCEDURE mcmain.mcmsp_eqmascheck

    (@plannummer VarChar(20) = '', @VersionVarChar(5) = '' Output, @debug Bit = 0 ) --WITH RECOMPILE

    /*31-03-2005 CPE

    1 mogelijke variabele @plannummer

    Als deze niet wordt meegegeven, komt er geen fout, maar wordt de variabele leeg (NotNull)

    -- Controle op fouten in eqmas

    -- 1. Check op onbestaande parents

    -- 2. Check op foute structuur (cycle)

    18-03-2013 bhr-ceca v2.0 Uitgebreid met opzoeken systrace oorzaken

    */

    AS

    EXEC mcmain.mcmsp_translog @transnummer = @plannummer, @Omschrijving = 'Initialize Gestart mcmsp_eqmascheck', @mcm_user = '', @Tabel = 'eqmas', @Aktie = '', @Keyval = ''

    -- Initialiseren

    SET NOCOUNT ON-- Snelere verwerking zonder (23 row(s) affected)

    -- Aanmaken interne variabelen

    DECLARE @StorProc VarChar(254)

    SET @StorProc= 'mcmsp_eqmascheck'

    SET @Version= '2.0'

    DECLARE @mcm_userVarchar(20)

    DECLARE@mcm_craftVarChar(20)

    DECLARE@mcm_groupVarChar(20)

    DECLARE@mcm_nameVarChar(45)

    DECLARE@mcm_siteVarChar(20)

    DECLARE@StrRowIdInt

    DECLARE@SysError TABLE(strObject VarChar(20), strUser VarChar(20), strAction VarChar(20), strDate DateTime, strSource VarChar(20), strRowId Int)

    SET @mcm_user= ''

    SET @mcm_craft= ''

    SET @mcm_group= ''

    SET @mcm_name= ''

    SET @mcm_site= ''

    EXEC mcmain.mcmsp_translog @transnummer = @plannummer, @Omschrijving = 'Aanmaken eqmaserror tabel', @mcm_user = @mcm_user, @Tabel = 'eqmaserror', @Aktie = 'CREATE', @Keyval = '', @StorProc = @StorProc

    If @Debug = 0

    BEGIN

    IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = object_id(N'[mcmain].[eqmaserror]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1)

    DROP TABLE [mcmain].[eqmaserror]

    CREATE TABLE [mcmain].[eqmaserror] (

    [Object]VarChar(20),--[char] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [ObjectOmschrijving]VarChar(45),-- [char] (45) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [OnderdeelVan]VarChar(20),--[char] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [Fout]VarChar(45),-- COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [FoutCode]VarChar(10),-- COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [ControleDatum]DateTime ,

    [Sitecode]VarChar(20),

    [Usercode]VarChar(32),

    [UserAction]VarChar(45),

    [UserDate]DateTime,

    [UserSource]VarChar(45),

    [UserSourceRowId]INT,

    [CreateUsercode]VarChar(32),

    [CreateUserDate]DateTime,

    [er_rowid] [int] IDENTITY (1, 1) NOT NULL

    ) ON [PRIMARY]

    END

    EXEC mcmain.mcmsp_translog @transnummer = @plannummer, @Omschrijving = 'Opzoeken onbekende parents', @mcm_user = @mcm_user, @Tabel = 'eqmaserror', @Aktie = 'INSERT', @Keyval = '', @StorProc = @StorProc

    IF @Debug = 0

    BEGIN

    INSERT INTO mcmain.eqmaserror(Object, ObjectOmschrijving, OnderdeelVan, Fout, ControleDatum, FoutCode, Sitecode)

    SELECT eqmas.eq_nmr, eqmas.eq_desc,eqmas.eq_parent, 'Onbekende parent', GetDate(), '1', eqmas.eq_site

    FROM mcmain.eqmas eqmas

    LEFT OUTER JOIN mcmain.eqmas eqparent ON eqparent.eq_nmr = eqmas.eq_parent

    WHERE eqparent.eq_nmr IS NULL and eqmas.eq_parent > ''

    END

    EXEC mcmain.mcmsp_translog @transnummer = @plannummer, @Omschrijving = 'Opzoeken cyclic parents (A->B->C->A)', @mcm_user = @mcm_user, @Tabel = 'eqmaserror', @Aktie = 'INSERT', @Keyval = '', @StorProc = @StorProc

    IF @Debug = 0

    BEGIN

    -- DECLARE @Eqstruct Table(eqparent VarChar(20), eqchild VarChar(20)) -- Memory table om parent child relatie te berekenen

    DECLARE DataSET Cursor FOR SELECT eq_nmr, eq_site FROM mcmain.eqmas

    DECLARE @EqparentVarChar(20)

    ,@EqchildVarChar(20) -- wijzigt niet

    ,@EqSiteVarChar(20) -- wijzigt niet

    ,@EqnummerVarChar(20)

    ,@EqDescVarChar(45)

    ,@FoutcodeVarChar(10)

    ,@EqlevelInt-- Current level eqmas record

    ,@EqMaxLevel Int-- Max level = Aantal Records

    SET @Eqlevel = 0

    SET @EqMaxLevel = (Select COUNT(eq_nmr) from mcmain.eqmas)

    OPEN Dataset

    FETCH NEXT FROM DataSET INTO @Eqnummer, @EqSite

    SET @Eqchild = @Eqnummer

    SELECT@EqParent = eq_parent, @EqDesc = eq_desc FROM mcmain.eqmas

    WHERE eq_nmr = @EqNummer

    --SET @Eqparent = (Select eq_parent from mcmain.eqmas where eq_nmr = @Eqnummer)

    --SET @EqDesc = (Select eq_desc from mcmain.eqmas where eq_nmr = @Eqnummer)

    WHILE (@@FETCH_STATUS = 0)

    BEGIN

    SET @Eqlevel = 0

    While Rtrim(@Eqparent) > ''

    Begin

    -- INSERT INTO @eqstruct(eqparent, eqchild)

    -- Values(@eqparent, @eqchild)

    SET @EqNummer = @Eqparent

    SELECT @Eqparent = eq_parent FROM mcmain.eqmas WHERE eq_nmr = @Eqnummer

    SET @Eqlevel = @EqLevel + 1

    IF @EqLevel >= @EqMaxLevel

    BEGIN

    INSERT INTO mcmain.eqmaserror(Object, ObjectOmschrijving, OnderdeelVan, Fout, ControleDatum, FoutCode, Sitecode)

    Values(@Eqchild,@EqDesc,@EqParent,'Cyclic Parent gevonden', GetDate(), '2', @EqSite)

    -- PRINT 'Fout '+@EqParent+' '+@Eqchild+' '+@Eqnummer

    SET @EqParent = ''

    END

    End

    FETCH NEXT FROM DataSET INTO @Eqnummer, @EqSite

    SELECT @EqDesc = eq_desc FROM mcmain.eqmas WHERE eq_nmr = @Eqnummer

    SET @Eqchild = @Eqnummer

    SET @Eqparent = @Eqnummer

    END

    Close Dataset

    Deallocate Dataset

    END

    -- Als er geen fouten gevonden zijn, kan de eqmaserror tabel weer weg.

    If @Debug = 0

    BEGIN

    IF NOT EXISTS (SELECT * FROM mcmain.eqmaserror)

    BEGIN

    EXEC mcmain.mcmsp_translog @transnummer = @plannummer, @Omschrijving = 'Verwijderen lege eqmaserror', @mcm_user = @mcm_user, @Tabel = 'eqmaserror', @Aktie = 'INSERT', @Keyval = '', @StorProc = @StorProc

    DROP TABLE [mcmain].[eqmaserror]

    END

    ELSE

    BEGIN -- Opzoeken geschatte redenen (18-03-2013)

    DECLARE DataSET Cursor FOR SELECT [OnderdeelVan], [SiteCode], [FoutCode] FROM mcmain.eqmaserror --WHERE [FoutCode] = '1'

    OPEN Dataset

    FETCH NEXT FROM Dataset INTO @EqParent, @EqSite, @FoutCode

    WHILE (@@FETCH_STATUS = 0)

    BEGIN

    IF@Foutcode = '1'

    BEGIN -- Opzoeken gedelete records

    INSERT INTO @SysError(strObject, strUser, strAction, strDate, strSource, strRowId)

    SELECT TOP 1 str_keyval, str_usr, str_type, str_date, 'Systrace', str_rowid

    FROM mcmain.systrace

    WHERE str_keyval = @EqParent AND str_type = 'UDL'

    ORDER BY str_date

    --SELECT @@RowCount

    IF @@RowCount = 0

    BEGIN

    INSERT INTO @SysError(strObject, strUser, strAction, strDate, strSource, strRowId)

    SELECT TOP 1 str_keyval, str_usr, str_type, str_date, 'Systrace', str_rowid

    FROM McSystem.mcmain.systrace

    WHERE str_keyval = @EqParent AND str_site = @EqSite AND str_type = 'UDL'

    ORDER BY str_date

    END

    END

    IF@Foutcode = '2'

    BEGIN

    INSERT INTO @SysError(strObject, strUser, strAction, strDate, strSource, strRowId)

    SELECT TOP 1 str_keyval, str_usr, str_type, str_date, 'Systrace', str_rowid

    FROM mcmain.systrace WHERE str_keyval = @EqParent and str_site = @EqSite

    and str_desc like '%UPDATE%' and str_desc like '%eq_parent%'

    order by str_date

    END

    IF @@RowCount = 0

    BEGIN

    INSERT INTO @SysError(strObject, strUser, strAction, strDate, strSource, strRowId)

    SELECT TOP 1 str_keyval, str_usr, str_type, str_date, 'McSystem', str_rowid

    FROM McSystem.mcmain.systrace WHERE str_keyval = @EqParent and str_site = @EqSite

    and str_desc like '%UPDATE%' and str_desc like '%eq_parent%'

    order by str_date

    --select @EqParent, @EqSite, @@RowCount

    END

    UPDATE mcmain.eqmaserror

    SETUserCode = strUser

    ,UserAction=strAction

    ,UserDate=strDate

    ,UserSource=strSource

    ,UserSourceRowId=strRowId

    FROM mcmain.eqmaserror

    INNER JOIN @SysError ON strObject = [OnderdeelVan] AND FoutCode = @FoutCode

    FETCH NEXT FROM Dataset INTO @EqParent, @EqSite, @FoutCode

    DELETE @SysError

    END

    Close Dataset

    Deallocate Dataset

    --SELECT * FROM @SysError

    UPDATE mcmain.eqmaserror

    SETUserCode=eq_chuser

    --,UserAction=strAction

    ,UserDate=eq_chdate

    ,UserSource='eqmas'

    ,UserSourceRowId=eq_rowid

    FROM mcmain.eqmaserror

    INNER JOIN mcmain.eqmas ON eq_nmr = [OnderdeelVan] AND eq_chdate > '' and Userdate is Null

    UPDATE mcmain.eqmaserror

    SETCreateUserCode=eq_chuser

    ,CreateUserDate=eq_chdate

    FROM mcmain.eqmaserror

    INNER JOIN mcmain.eqmas ON eq_nmr = Object

    EXEC mcmain.mcmsp_translog @transnummer = @plannummer, @Omschrijving = 'Tonen foute records', @mcm_user = @mcm_user, @Tabel = 'eqmaserror', @Aktie = 'SELECT', @Keyval = '', @StorProc = @StorProc

    SELECT * FROM [mcmain].[eqmaserror]

    END

    END

    -- Finaliseren

    EXEC mcmain.mcmsp_translog @transnummer = @plannummer, @Omschrijving = 'Finaliseren', @mcm_user = @mcm_user, @Tabel = '', @Aktie = '', @Keyval = '', @StorProc = @StorProc

    EXEC mcmain.mcmsp_translog @transnummer = @plannummer, @Omschrijving = 'Eqmas controle gereed', @mcm_user = @mcm_user, @Tabel = '', @Aktie = '', @Keyval = '', @StorProc = @StorProc

    GO

  • Once you have all of the problems resolved with the "non-existing parent" problem, you should add a self-referencing FK that contrains the values of parentIDs to only those values available as child IDs.

    On the "circular reference problem", I'm getting ready for work and can't demo the code just now. If someone doesn't get to it (would be done during the recurrsive CTE to build the sort path) before I do later tonight, I'll see if I can explain it with some code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'll look into the FK thingie. Would be nice if you could point me in the right direction to modify my sp to CTE instead of cursors.

  • Cees Cappelle-442904 (2/19/2014)


    I'll look into the FK thingie. Would be nice if you could point me in the right direction to modify my sp to CTE instead of cursors.

    Apologies for the delays. I've had a pretty full dance card for the last couple of days but I have been working on some code for you. It's actually simple code but I test the hell out of these things before I publish them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... Here's your table with some modified data to make a couple of points.

    -- DROP TABLE dbo.eqmas

    go

    CREATE TABLE dbo.eqmas

    (eq_nmr VARCHAR(20), eq_parent VARCHAR(20), eq_desc VARCHAR(80))

    ;

    INSERT INTO dbo.eqmas

    SELECT 'OB01', Null , 'Object 1' UNION ALL

    SELECT 'OB02', 'OB01', 'Object 2' UNION ALL --Circular

    SELECT 'OB03', 'OB01', 'Object 3' UNION ALL

    SELECT 'OB04', 'OB02', 'Object 4' UNION ALL

    SELECT 'OB05', 'OB02', 'Object 5' UNION ALL --Circular

    SELECT 'OB11', 'OB05', 'Object 11' UNION ALL --Circular

    SELECT 'OB02', 'OB11', 'Object 2' UNION ALL --Circular -- This is the cause

    SELECT 'OB06', 'OB10', 'Object 6' UNION ALL --Stranger (is not a child)

    SELECT 'OB07', 'OB08', 'Object 7' UNION ALL --Orphan (Island)

    SELECT 'OB08', 'OB09', 'Object 8' UNION ALL --Orphan (Island)

    SELECT 'OB09', 'OB07', 'Object 9' --Orphan (Island)

    ;

    I suppose someone could do this all in one query but I wanted to keep the hierarcy "builder" separate from the code that does the troubleshooting. If any rows show up in the trouble shooting section of the code, then you need to make a fix in the data.

    /*=============================================================================

    This will find the "orphans" including the "tail" of any circular references.

    Note that an orphaned group of nodes that contains a circular reference

    usually won't cause a "runaway".

    =============================================================================*/

    --===== This part builds the hierarchy with as much data as it can stopping

    -- short of "runaway" circular references.

    IF OBJECT_ID('tempdb..#Found','U') IS NOT NULL

    DROP TABLE #Found

    ;

    WITH

    cteDirectReports AS

    (

    SELECT eq_nmr, eq_parent, Lvl = 1,

    HierarchicalPath = CAST('\'+eq_nmr+'\' AS VARCHAR(MAX))

    FROM dbo.eqmas

    WHERE eq_parent IS NULL

    UNION ALL

    SELECT t.eq_nmr, t.eq_parent, Lvl = c.Lvl + 1,

    HierarchicalPath = CAST(c.HierarchicalPath+t.eq_nmr+'\' AS VARCHAR(MAX))

    FROM dbo.eqmas t

    INNER JOIN cteDirectReports c ON t.eq_parent = c.eq_nmr

    WHERE c.HierarchicalPath NOT LIKE '%\'+ t.eq_nmr+'\%'

    )

    SELECT NodeNumber = ROW_NUMBER()OVER(ORDER BY HierarchicalPath)

    ,eq_nmr, eq_parent, Lvl, HierarchicalPath

    INTO #Found

    FROM cteDirectReports

    ORDER BY HierarchicalPath

    ;

    --===== Display the sorted Adjacency List of what "fit" in the hierarchy

    SELECT *

    FROM #Found

    ORDER BY NodeNumber

    ;

    --===== Find all the problems with the Adjacency List

    WITH

    cteOrphans AS

    ( --=== This "subtracts" the nodes that were found above from all the nodes

    -- available to return "orphans".

    SELECT eq_nmr, eq_parent FROM dbo.eqmas

    EXCEPT

    SELECT eq_nmr, eq_parent FROM #Found

    )

    SELECT o.*, f.*

    ,Problem = CASE

    WHEN f.HierarchicalPath LIKE '%\'+o.eq_nmr+'\%' THEN 'Circular Cause'

    WHEN s.eq_nmr IS NULL THEN 'Stranger'

    WHEN f.eq_nmr IS NULL THEN 'Orphan'

    ELSE '???????'

    END

    FROM cteOrphans o

    LEFT JOIN #Found f ON o.eq_parent = f.eq_nmr

    LEFT JOIN dbo.eqmas s ON o.eq_parent = s.eq_nmr

    ORDER BY o.eq_nmr

    ;

    Here's the output of the #Found table.

    NodeNumber eq_nmr eq_parent Lvl HierarchicalPath

    ---------- ------ --------- --- ---------------------

    1 OB01 NULL 1 \OB012 OB02 OB01 2 \OB01\OB023 OB04 OB02 3 \OB01\OB02\OB044 OB05 OB02 3 \OB01\OB02\OB055 OB11 OB05 4 \OB01\OB02\OB05\OB116 OB03 OB01 2 \OB01\OB03

    (6 row(s) affected)

    And, here's the output of the "problem finder" code. I'll let you rename the duplicate column names as you see fit.

    eq_nmr eq_parent NodeNumber eq_nmr eq_parent Lvl HierarchicalPath Problem

    ------ --------- ---------- ------ --------- ---- --------------------- --------------

    OB02 OB11 5 OB11 OB05 4 \OB01\OB02\OB05\OB11\ Circular Cause

    OB06 OB10 NULL NULL NULL NULL NULL Stranger

    OB07 OB08 NULL NULL NULL NULL NULL Orphan

    OB08 OB09 NULL NULL NULL NULL NULL Orphan

    OB09 OB07 NULL NULL NULL NULL NULL Orphan

    (5 row(s) affected)

    To be honest, I've not tested it for performance but it should do pretty well especially if you have a clustered index in the eq_parent column. Try it with the indexes you have, first.

    I'll be back on the FK thing in a couple of minutes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's a classic example of an "Employee" organizational chart table. It has the self referencing FK that I'm talking about where the "ManagerID" must be found as an "EmployeeID" except when the "ManagerID" is NULL. Such a thing will prevent "strangers" and the PK on the "EmployeeID" column will prevent most circular references not to mention "cross branching" where an employee has more than 1 position.

    --===== Create the test table with a clustered PK and an FK to itself to make

    -- sure that a ManagerID is also an EmployeeID.

    CREATE TABLE dbo.Employee

    (

    EmployeeID INT NOT NULL,

    ManagerID INT NULL,

    EmployeeName VARCHAR(10) NOT NULL,

    CONSTRAINT PK_Employee

    PRIMARY KEY CLUSTERED (EmployeeID),

    CONSTRAINT FK_Employee_Employee

    FOREIGN KEY (ManagerID)

    REFERENCES dbo.Employee (EmployeeID)

    ON UPDATE NO ACTION

    ON DELETE NO ACTION

    )

    ;

    Of course, none of that will work until you iron out the bugs in the data. Once thats, done, add the PK and the FK I mentioned and a whole lot of problems will be prevented. There are other reasonable checks that you can add like making sure that the child column isn't equal to the parent column for any given row.

    Of course, those won't help with "orphans" so you'll still need to do something like I did in the "Problem checking code" in the post above this one.

    Shifting gears a bit, the Adjacency List (parent/child) type of hierarchy is comparatively easy for humans to fix because each node is aware of one and only one other node. The "Hierarchical Path" hierarchy that I also formed (and is similar to the HierarchyID datatype, which I never use) has it's uses but is a bit difficult to use for anything practical. Nested Sets are really fast and sortable on the "Left Bower" but, like Hierarchical Paths, are a real PITA to maintain because each node is aware of many other nodes.

    With that thought in mind, you can take advantage of the advantages of each. Please see the following article for how that can be done.

    http://www.sqlservercentral.com/articles/Hierarchy/94040/

    On the other hand, there's normally a pretty finite limit as to what people actually want to get out of hierarchies. The following article explains how to solve for all of those in a kind of new, pre-aggregated hierarchical structure. MLM'ers love this method.

    http://www.sqlservercentral.com/articles/T-SQL/94570/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, wonderful post, very helpful.

    I'll go study your example code and see what comes up.

    Thanks for the reference to hierarchy studies, I'll put it in my reading/todo list.

    Tried your code and it works like a charm.

    My SP takes 2.5 secs on a small production db, your code 0.085 with 1 instead of two roundtrips.

    On my largest one (61616 eqmas records) my full sp 48.2 sec , 2 roundtrips

    Your code 7 roundtrips, 33.2 sec

    I'll integrate your code in my sp

    Cees Cappelle

  • Sorry to be a bother, but seem to have found an issue with your code.

    You state:

    WHEN f.HierarchicalPath LIKE '%\'+o.eq_nmr+'\%' THEN 'Circular Cause'

    but f references #found with a

    WHERE c.HierarchicalPath NOT LIKE '%\'+ t.eq_nmr+'\%'

    Can't get it to work like I expect. I've added a few more lines to the data:

    INSERT INTO dbo.eqmas

    SELECT 'OB01', Null , 'Object 1' UNION ALL

    SELECT 'OB02', 'OB01', 'Object 2' UNION ALL --Circular

    SELECT 'OB03', 'OB01', 'Object 3' UNION ALL

    SELECT 'OB04', 'OB02', 'Object 4' UNION ALL

    SELECT 'OB05', 'OB02', 'Object 5' UNION ALL --Circular

    SELECT 'OB11', 'OB05', 'Object 11' UNION ALL --Circular

    SELECT 'OB02', 'OB11', 'Object 2' UNION ALL --Circular -- This is the cause

    SELECT 'OB06', 'OB10', 'Object 6' UNION ALL --Stranger (is not a child)

    SELECT 'OB07', 'OB08', 'Object 7' UNION ALL --Orphan (Island)

    SELECT 'OB08', 'OB09', 'Object 8' UNION ALL --Orphan (Island)

    SELECT 'OB09', 'OB07', 'Object 9' UNION ALL --Orphan (Island)

    SELECT 'TSTOB222 ','TSTOB21 ','' UNION ALL

    SELECT 'TSTOB221 ','TSTOB22 ','' UNION ALL

    SELECT 'TSTOB2 ','TSTOB222' ,''

    ;

    This gives me:

    eq_nmreq_parentNodeNumbereq_nmreq_parentLvlHierarchicalPathProblem

    OB02OB115OB11OB054\OB01\OB02\OB05\OB11\Circular Cause

    OB06OB10NULLNULLNULLNULLNULLStranger

    OB07OB08NULLNULLNULLNULLNULLOrphan

    OB08OB09NULLNULLNULLNULLNULLOrphan

    OB09OB07NULLNULLNULLNULLNULLOrphan

    TSTOB2 TSTOB222NULLNULLNULLNULLNULLOrphan

    TSTOB221 TSTOB22 NULLNULLNULLNULLNULLStranger

    TSTOB222 TSTOB21 NULLNULLNULLNULLNULLStranger

    OB2 gives me a Circular, but is goes from OB2-OB1, that's a normal Child-Parent.

    TSTOB2 gives me an Orphan, but it's a circular.

    Can you help?

  • I have to get ready for work but let me throw a couple of things your way.

    The code found all of the problems and now is a matter of interpretation for the human. To explain...

    The "group" of 3 rows that begins with "TS" isn't a circular reference because 2 of the 3 rows contain "strangers" in the parent column. That is, they contain IDs in the parent column that are not contained in the child column. The 3rd one is truly an "orphan" because there's no path back to the root. So,

    The "group" of 3 rows that contain OB07, OB08, and OB09 are circular but the code can't get to that group to make that determination because of the larger problem... there's no path back to the root from this group, which is the definition of an "orphan". So, the group actually has 2 problems, the most important of which is that they're "orphans".

    OB2 gives me a Circular, but is goes from OB2-OB1, that's a normal Child-Parent.

    To be semantically correct, there is no OB2 or OB1. There's an OB02 and an OB01. Hierarchies are confusing enough without leaving characters out of the names. 😉 I suspect that typing problem is also the cause of why you thought the "TS" group was circular instead of 2 of them being "strangers".

    That, not withstanding, I had comment-marked that row as "circular" because it's part of a circular path and not necessarily a part of the "loop". Ignore that comment. The BIG problem here is the OB02/OB11 row and the code found it. It not only causes a circular reference but it also causes "cross branching" which is why the number of rows associated with it "exploded". The problem with "cross branching" will also cause explosive duplication of rows in the #Found table, as well.

    The next step in trying to repair the data would be to determine what the correct parent of all those rows are, fix them, and do another run until no rows appear in the "problems listing". Of course, that would mean getting more information from the people that "designed" this hierarchy or removing the bad rows as "not resolvable".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you, Jeff. I've been away a few days, sorry for the late reply.

    Yes, the typo's are my bad. I'll study some more on your explanations.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply