Conditional joining of Table A nested ID/heirarchyid to table A or B?

  • It's been a long time since I've dealt with complex database design, and in this case, I'd like to ask the opinions of others who are more current than I am.

    I need to design a good structure to store and enforce the rules, if practical on two types of information.

    Constraint: CLR is disallowed.

    Constraint: SQL Server 2012 SP1 is the DB being targeted.

    Physical object

    No physical object relates to any other physical row.

    Physical objects have at most one descendent virtual object row.

    Physical objects have "Physical" type data, which is unrelated to the data virtual objects have (mostly non-overlapping)

    Virtual object

    All virtual objects have a parent of EITHER a physical object, OR a virtual object, never both.

    No virtual object is its own parent.

    All virtual objects relate either directly or indirectly (through ancestor virtual objects) to a physical object.

    Virtual objects have zero or more descendent virtual object row(s).

    Virtual objects have "Virtual" type data, which is unrelated to the data physical objects have (mostly non-overlapping)

    I.e. example objects:

    Phys1

    Phys2

    Phys3

    Virt1 - parent Phys1

    Virt2 - parent Phys2

    Virt3 - parent Virt2

    Virt4 - parent Virt3

    Virt5 - parent Virt3

    My initial idea is one I really don't like, as I don't like conditional field choice (use field A except when you use field B), but I hate fields that join to more than one table even more, which resulted in:

    DB init

    CREATE DATABASE [PhysVirtTest]

    CONTAINMENT = NONE

    ON PRIMARY

    ( NAME = N'PhysVirtTest', FILENAME = N'YourPath\PhysVirtTest.mdf' , SIZE = 8MB , FILEGROWTH = 4MB )

    LOG ON

    ( NAME = N'PhysVirtTest_log', FILENAME = N'YourPath\PhysVirtTest_log.ldf' , SIZE = 4MB , FILEGROWTH = 4MB )

    GO

    ALTER DATABASE [PhysVirtTest] SET COMPATIBILITY_LEVEL = 110

    GO

    ALTER DATABASE [PhysVirtTest] SET ANSI_NULL_DEFAULT OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET ANSI_NULLS OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET ANSI_PADDING OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET ANSI_WARNINGS OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET ARITHABORT OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET AUTO_CLOSE OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET AUTO_CREATE_STATISTICS ON

    GO

    ALTER DATABASE [PhysVirtTest] SET AUTO_SHRINK OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET AUTO_UPDATE_STATISTICS ON

    GO

    ALTER DATABASE [PhysVirtTest] SET CURSOR_CLOSE_ON_COMMIT OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET CURSOR_DEFAULT GLOBAL

    GO

    ALTER DATABASE [PhysVirtTest] SET CONCAT_NULL_YIELDS_NULL OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET NUMERIC_ROUNDABORT OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET QUOTED_IDENTIFIER OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET RECURSIVE_TRIGGERS OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET DISABLE_BROKER

    GO

    ALTER DATABASE [PhysVirtTest] SET AUTO_UPDATE_STATISTICS_ASYNC OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET DATE_CORRELATION_OPTIMIZATION OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET PARAMETERIZATION SIMPLE

    GO

    ALTER DATABASE [PhysVirtTest] SET READ_COMMITTED_SNAPSHOT OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET READ_WRITE

    GO

    ALTER DATABASE [PhysVirtTest] SET RECOVERY SIMPLE

    GO

    ALTER DATABASE [PhysVirtTest] SET MULTI_USER

    GO

    ALTER DATABASE [PhysVirtTest] SET PAGE_VERIFY CHECKSUM

    GO

    ALTER DATABASE [PhysVirtTest] SET TARGET_RECOVERY_TIME = 0 SECONDS

    GO

    USE [PhysVirtTest]

    GO

    IF NOT EXISTS (SELECT name FROM sys.filegroups WHERE is_default=1 AND name = N'PRIMARY') ALTER DATABASE [PhysVirtTest] MODIFY FILEGROUP [PRIMARY] DEFAULT

    GO

    Old-fashioned way

    -- The first try, without using HierarchyID - output still needs to be consolidated with FOR XML or other tricks

    USE [PhysVirtTest];

    DROP TABLE dbo.VirtEx1;

    DROP TABLE dbo.PhysEx1;

    CREATE TABLE dbo.PhysEx1

    ( ID INT IDENTITY(1,1) NOT NULL

    ,Data VARCHAR(32)

    ,CONSTRAINT PKP1 PRIMARY KEY CLUSTERED (ID)

    );

    CREATE TABLE dbo.VirtEx1

    ( ID INT IDENTITY(1,1) NOT NULL

    ,VirtParentID INT NULL CONSTRAINT FKVV1 FOREIGN KEY REFERENCES PhysVirtTest.dbo.VirtEx1(ID)

    ,PhysParentID INT NULL CONSTRAINT FKVP1 FOREIGN KEY REFERENCES PhysVirtTest.dbo.PhysEx1(ID)

    ,Data VARCHAR(32)

    ,CONSTRAINT CKOnePar1 CHECK ((VirtParentID IS NOT NULL AND PhysParentID IS NULL) OR (VirtParentID IS NULL AND PhysParentID IS NOT NULL))

    ,CONSTRAINT CKVDup1 CHECK (VirtParentID <> ID)

    ,CONSTRAINT PKV1 PRIMARY KEY CLUSTERED (ID)

    );

    CREATE UNIQUE NONCLUSTERED INDEX UNIVP1 ON VirtEx1

    (PhysParentID)

    WHERE PhysParentID IS NOT NULL;

    -- Insert Good Data

    INSERT INTO PhysEx1 (Data) VALUES ('Phys1'), ('Phys2'), ('Phys3');

    INSERT INTO VirtEx1 (VirtParentID, PhysParentID, Data) VALUES(NULL,1,'Virt1'), (NULL,2,'Virt2'), (2,NULL,'Virt3'),(3,NULL,'Virt4'),(3,NULL,'Virt5')

    -- Basic data display

    SELECT * FROM PhysEx1

    SELECT * FROM VirtEx1

    -- Virtual object parental display

    -- THIS TECHNIQUE ALLOWS STRICTLY LIMITED NESTING (but doesn't loop)

    -- If you don't have ALL NULLs in the right-most ancestor column,

    -- you may be missing some ancestors!

    SELECT Vdesc.*

    ,COALESCE(Vpar4.Data, Ppar4.DATA, Vpar3.Data, Ppar3.DATA, Vpar2.Data, Ppar2.DATA, Vpar1.Data, Ppar1.DATA) AS PhysHost

    ,COALESCE(Vpar1.Data, Ppar1.DATA) AS Parent

    ,COALESCE(Vpar2.Data, Ppar2.DATA) AS GrandParent

    ,COALESCE(Vpar3.Data, Ppar3.DATA) AS GreatGrandParent

    ,COALESCE(Vpar4.Data, Ppar4.DATA) AS GreatGreatGrandParent

    FROM VirtEx1 Vdesc

    LEFT OUTER JOIN VirtEx1 Vpar1

    ON Vpar1.ID = Vdesc.VirtParentID

    LEFT OUTER JOIN PhysEx1 Ppar1

    ON Ppar1.ID = Vdesc.PhysParentID

    LEFT OUTER JOIN VirtEx1 Vpar2

    ON Vpar2.ID = Vpar1.VirtParentID

    LEFT OUTER JOIN PhysEx1 Ppar2

    ON Ppar2.ID = Vpar1.PhysParentID

    LEFT OUTER JOIN VirtEx1 Vpar3

    ON Vpar3.ID = Vpar2.VirtParentID

    LEFT OUTER JOIN PhysEx1 Ppar3

    ON Ppar3.ID = Vpar2.PhysParentID

    LEFT OUTER JOIN VirtEx1 Vpar4

    ON Vpar4.ID = Vpar3.VirtParentID

    LEFT OUTER JOIN PhysEx1 Ppar4

    ON Ppar4.ID = Vpar3.PhysParentID

    -- Test Bad Data prevention

    INSERT INTO VirtEx1 (VirtParentID, PhysParentID, Data) VALUES(NULL,1,'DuplicatePhysParent')

    INSERT INTO VirtEx1 (VirtParentID, PhysParentID, Data) VALUES(NULL,NULL,'NoParent')

    INSERT INTO VirtEx1 (VirtParentID, PhysParentID, Data) VALUES(1,1,'TooManyParent')

    INSERT INTO VirtEx1 (VirtParentID, PhysParentID, Data) VALUES(111,NULL,'VirtParentNotExist')

    INSERT INTO VirtEx1 (VirtParentID, PhysParentID, Data) VALUES(NULL,111,'PhysParentNotExist')

    Possible hierarchyid based, but this is my very first foray into hierarchyid, and trying to get the Data values for ancestors isn't as simple as I'd hoped.

    -- The second try, using HierarchyID - output is still not coded at all

    USE [PhysVirtTest];

    DROP TABLE dbo.VirtEx2;

    DROP TABLE dbo.PhysEx2;

    CREATE TABLE dbo.PhysEx2

    ( ID INT IDENTITY(1,1) NOT NULL

    ,Data VARCHAR(32)

    ,CONSTRAINT PKP2 PRIMARY KEY CLUSTERED (ID)

    );

    CREATE TABLE dbo.VirtEx2

    ( ID HIERARCHYID NOT NULL

    ,PhysParentID INT NULL CONSTRAINT FKVP2 FOREIGN KEY REFERENCES PhysVirtTest.dbo.PhysEx2(ID)

    ,Data VARCHAR(32)

    --,CONSTRAINT CKOnePar2 CHECK ((VirtParentID IS NOT NULL AND PhysParentID IS NULL) OR (VirtParentID IS NULL AND PhysParentID IS NOT NULL))

    ,CONSTRAINT PKV2 PRIMARY KEY CLUSTERED (ID)

    );

    CREATE UNIQUE NONCLUSTERED INDEX UNIVP2 ON VirtEx2

    (PhysParentID)

    WHERE PhysParentID IS NOT NULL;

    -- Insert Good Data

    INSERT INTO PhysEx2 (Data) VALUES ('Phys1'), ('Phys2'), ('Phys3');

    -- Hierarchyid guideance from http://stackoverflow.com/questions/1352778/sql-2008-hierarchyid-with-multiple-root-nodes

    INSERT INTO VirtEx2 (ID, PhysParentID, Data) VALUES(hierarchyid::GetRoot().GetDescendant(NULL,NULL),1,'Virt1')

    INSERT INTO VirtEx2 (ID, PhysParentID, Data) VALUES(hierarchyid::GetRoot().GetDescendant((select MAX(ID) from VirtEx2 where ID.GetAncestor(1) = hierarchyid::GetRoot()),NULL),2,'Virt2')

    DECLARE @VirtParent hierarchyid

    DECLARE @VirtNew hierarchyid

    SELECT @VirtParent = ID FROM VirtEx2 WHERE Data = 'Virt2';

    INSERT INTO VirtEx2 (ID, PhysParentID, Data) VALUES(@VirtParent.GetDescendant(NULL,NULL),NULL,'Virt3')

    SELECT @VirtParent = ID FROM VirtEx2 WHERE Data = 'Virt3';

    SELECT @VirtNew = MAX(ID) FROM VirtEx2 WHERE ID.GetAncestor(1) = @VirtParent;

    -- Method that works only for the first node: INSERT INTO VirtEx2 (ID, PhysParentID, Data) VALUES(@VirtParent.GetDescendant(NULL,NULL),NULL,'Virt4')

    INSERT INTO VirtEx2 (ID, PhysParentID, Data) VALUES(@VirtParent.GetDescendant(@VirtNew,NULL),NULL,'Virt4')

    SELECT @VirtNew = MAX(ID) FROM VirtEx2 WHERE ID.GetAncestor(1) = @VirtParent;

    INSERT INTO VirtEx2 (ID, PhysParentID, Data) VALUES(@VirtParent.GetDescendant(@VirtNew,NULL),NULL,'Virt5')

    -- Basic data display

    SELECT * FROM PhysEx2

    SELECT ID.ToString() AS IDString, ID.GetLevel() AS IDLevel, * FROM VirtEx2

    -- ??? Advanced data display - parents and grandparents.

    -- Test Bad Data prevention

    --'DuplicatePhysParent'

    --'NoParent'

    --'TooManyParent'

    --'VirtParentNotExist'

    --'PhysParentNotExist'

    Thank you all for your time in reading and/or replying.

  • I haven't done any database design that really matters for a few years, but I would do something very different from your idea.

    Lets start with fundamental basics.

    I want the design to be fairly proof against new releases of the RDBMS software, in this case SQL Server; and I want not to have it preclude the use of recently added (eight years ago?) features by using bizarre configuration. So I would change this block of code to have ON in every case where tyou have OFF.ALTER DATABASE [PhysVirtTest] SET ANSI_NULL_DEFAULT OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET ANSI_NULLS OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET ANSI_PADDING OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET ANSI_WARNINGS OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET ARITHABORT OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET CURSOR_CLOSE_ON_COMMIT OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET CONCAT_NULL_YIELDS_NULL OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET NUMERIC_ROUNDABORT OFF

    GO

    ALTER DATABASE [PhysVirtTest] SET QUOTED_IDENTIFIER OFF

    GOI'm not sure that I haven't included a single ine that could sensibly have OFF, but certainly the vast majority should have ON.

    Then why are there two tables (virtual objects and physical objects) instead of one? It doesn't help at all, and it does hinder! Also, what is the attitude to NULL? It seems to me that avoiding NULL is the only imaginable reason for having two tables instead of one, but your schema has NULLs anyway. Depending on how NULL-averse you are you might have more than one table, but the extra tables over and above the basic one table would be link tables, not base object tables.

    So I would end up with something like CREATE TABLE dbo.PhysorVirtEx

    ( ID INT IDENTITY(1,1) NOT NULL Primary key Clustered

    ,ParentID INT NULL REFERENCES PhysorVirtEx(ID),

    ,Data VARCHAR(32)

    ,IsVirtual BIT NOT NULL,

    ,CONSTRAINT CKOnePar1 CHECK (ParentID is not NULL or Isvirtual = 0)

    ,CONSTRAINT CKVDup1 CHECK (VirtParentID <> ID)

    );As you can see, I'm not so null-averse as to be silly about it.

    Tom

  • Physical objects have at most one descendent virtual object row.

    Virtual objects have zero or more descendent virtual object row(s).

    Which might be more common, someone asking to which physical object does this virtual object relate?? or which virtual objects refer to this physical object?.

    One more question, if a physical object has 10 virtual descendants which would be more common, a separation of 10 levels between the physical object and the leaf-level virtual object (i.e. each virtual object has only one descendant) or a separation of far less that 10, maybe 2 (e.g. physical object has one virtual child and that virtual object has 9 virtual object children).

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • L' Eomot Inversé (1/11/2013)


    I haven't done any database design that really matters for a few years, but I would do something very different from your idea.

    Lets start with fundamental basics.

    I want the design to be fairly proof against new releases of the RDBMS software, in this case SQL Server; and I want not to have it preclude the use of recently added (eight years ago?) features by using bizarre configuration. So I would change this block of code to have ON in every case where tyou have OFF.ALTER DATABASE [PhysVirtTest] SET ANSI_NULL_DEFAULT OFF

    ...I'm not sure that I haven't included a single ine that could sensibly have OFF, but certainly the vast majority should have ON.

    Then why are there two tables (virtual objects and physical objects) instead of one? It doesn't help at all, and it does hinder! Also, what is the attitude to NULL? It seems to me that avoiding NULL is the only imaginable reason for having two tables instead of one, but your schema has NULLs anyway. Depending on how NULL-averse you are you might have more than one table, but the extra tables over and above the basic one table would be link tables, not base object tables.

    So I would end up with something like CREATE TABLE dbo.PhysorVirtEx

    ( ID INT IDENTITY(1,1) NOT NULL Primary key Clustered

    ,ParentID INT NULL REFERENCES PhysorVirtEx(ID),

    ,Data VARCHAR(32)

    ,IsVirtual BIT NOT NULL,

    ,CONSTRAINT CKOnePar1 CHECK (ParentID is not NULL or Isvirtual = 0)

    ,CONSTRAINT CKVDup1 CHECK (VirtParentID <> ID)

    );As you can see, I'm not so null-averse as to be silly about it.

    First, thank you very much for responding.

    The ANSI settings were 100% the defaults SQL 2012 gave because I didn't care about the sample database, but I wanted one for your ease in cleaning up after helping me.

    I'll edit the first post to add this information about why two tables:

    Physical and Virtual objects have mostly very different data, which in a real table will be sets of columns - the "Data" field here is merely a placeholder for simplicity. This is why I have two tables, so that non-key fields are facts about the key.

    I apologize for not including that information originally.

    opc.three (1/13/2013)


    Physical objects have at most one descendent virtual object row.

    Virtual objects have zero or more descendent virtual object row(s).

    Which might be more common, someone asking to which physical object does this virtual object relate?? or which virtual objects refer to this physical object?.

    One more question, if a physical object has 10 virtual descendants which would be more common, a separation of 10 levels between the physical object and the leaf-level virtual object (i.e. each virtual object has only one descendant) or a separation of far less that 10, maybe 2 (e.g. physical object has one virtual child and that virtual object has 9 virtual object children).

    Thank you for responding as well!

    I expect that going from physical to virtual, and from virtual to physical, will be more or less equally common. Both will convey valuable and necessary data for certain tasks, and all of those tasks will be required on a day to day basis.

    Most often, a physical object will have from 0 to 6 direct virtual descendants at a leaf level (i.e. the first level down). Rarely, a physical object will have a completely unknown tree structure going 2-N levels down, where N is in actuality unbounded, but almost certainly less than 5. These cases cannot be easily predicted in advance - perhaps they're wide and deep, perhaps narrow and deep. I'd guess they'd be narrow and deep more often, but I cannot guarantee that.

    Alas, the rare/exception cases are always a pain to deal with.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply