HierarchyID/Indexing - Could use some help

  • Hi experts of the internet! I've got a weird issue, and I'm hoping you guys can help me out. I'm sure I'm just missing something fundamental here, but I'm at my wits end.

    Take the following scenario...

    CREATE TABLE #ParentItems (

    ItemID UNIQUEIDENTIFIER,

    HID HIERARCHYID,

    TreeID UNIQUEIDENTIFIER,

    HIDLevel AS HID.GetLevel()

    )

    CREATE TABLE #ChildItems (

    ItemID UNIQUEIDENTIFIER,

    HID HIERARCHYID,

    TreeID UNIQUEIDENTIFIER,

    HIDLevel AS HID.GetLevel()

    )

    CREATE UNIQUE NONCLUSTERED INDEX PInd ON #ParentItems(TreeID, HIDLevel, HID)

    CREATE UNIQUE NONCLUSTERED INDEX CInd ON #ChildItems(TreeID, HIDLevel, HID)

    So two tables, the parent table has a bunch of IDs/HIDs which I know are ancestors of items in the child tables. There are multiple different tree structures in each, hence the TreeId (all HIDs for the same tree, share the same treeID).

    Assume #ParentItems and #ChildItems have 20,000+ rows in each.

    The following query is reasonable fast (for what it's doing...) it takes around a second to complete.

    SELECT DISTINCT LIParent.ItemID as ParentID, LIChild.ItemID as ChildID FROM #ParentItems LIParent

    INNER JOIN #ChildItems LIChild ON LIChild.HID.IsDescendantOf(LIParent.HID)=1

    However, this ignores the TreeID relationship as well. Now the second I add "WHERE LIParent.TreeID = LIChild.TreeID" to the query (or add it to the join) the whole query jumps up to around 30 second execution time.

    This seems like a very simple change, and I believe the index is in place... but it slows down massively.

    Anyone have any ideas? Thanks in advance!

  • Please. Post. Your. Execution. Plan.

    First off, is this a scalar function in your JOIN? It's looks like one but want to be certain before jumping to any conclustions: IsDescendantOf(LIParent.HID) - this may not be a wise choice when running against larger data sets due to potential performance issues, but since you only have 20K rows...

    Second, your latter example (which you state stakes longer to run) JOINs on your indexed keys but is also is a uniqueidentifier. This datatype is considerably larger than your typical 4-byte integers (16 bytes) meaning indexes that are built using uniqueidentifier keys might be relatively slower than indexes using an int key.

    Can you please confirm?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Sorry for not posting the exec plan. Here you go https://www.dropbox.com/s/mmjthlfryvsr4l4/Execution%20plan.xml

    Yes, the TreeID is a GUID/UNIQUEIDENTIFIER. The IsDescendentOf(..) query is built into the MS HIERARCHYID datatype in SQL 2008 R2.

    I did some work on this last night. The following query is fast, just because I'm splitting out the two joins. This is a made up use case though, I don't think I can actually solve my problem permanently this way.

    SELECT DISTINCT LIParent.ItemID as ParentID, LIChild.ItemID as ChildID, LIParent.TreeID as ParentTreeID, LIChild.TreeID as ChildTreeID INTO #Interim FROM #Items LIParent

    INNER JOIN #ChildItems LIChild ON LIChild.HID.IsDescendantOf(LIParent.HID)=1

    INSERT INTO #FinalOutput SELECT ParentID, ChildID FROM #Interim WHERE ChildTreeID = ParentTreeID

    So..

    #1 - Query with no TreeID = fast.

    #2 - Query with TreeID = slow

    #3 - Query with no TreeID into Temp table, then Filter TreeID later = fast (a few ms slower than #1).

    Would it help if I posted a database with mock data and the structures/indexes in place?

  • Yes, it would help a lot (thanks)

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • https://www.dropbox.com/s/jd4mgt64zk80nsc/TestScenario.zip here's a backup file.

    SELECT LIParent.ItemID as ParentID, LIChild.ItemID as ChildID FROM TItems LIParent

    INNER JOIN TItems LIChild ON LIChild.HID.IsDescendantOf(LIParent.HID)=1

    WHERE LIParent.NodeType = 0 AND LIChild.NodeType = 1

    AND LIParent.TreeID = LIChild.TreeID

    You'll see that it takes ~20s to run. If you remove the last line, it's under a second. Thanks in advance for your help!

  • Sorry, but please attach .TXT files for data and table definitions, I'm not going to load your DB onto my system 😎

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Nevermind about the flat files, I loaded it up on a VM...

    The reason its taking a longer time to run with the last line included (AND LIParent.TreeID = LIChild.TreeID) is because these are the uniqueidentifier columns. With just the LIParent.NodeType = 0 AND LIChild.NodeType = 1, the comparison is between smallint datatypes, whereas TreeID is enforcing the JOIN back onto itself using the uniqueidentifier datatype (which due to the size) it will take long to do a row by row comparison.

    I don't know a way around this, but a quick google search found potential alternatives so unless someone else has suggestions, I recommend you check there - very sorry.

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Without the data, the scenario is worthless. The backup only has one table and one index so you can run/see the query. It's specially created for this exact scenario. It's only a 4 meg file :).

    CREATE TABLE [dbo].[TItems](

    [ItemID] [uniqueidentifier] NULL,

    [HID] [hierarchyid] NULL,

    [TreeID] [uniqueidentifier] NULL,

    [NodeType] [smallint] NULL,

    [HIDLevel] AS ([HID].[GetLevel]())

    ) ON [PRIMARY]

    GO

    CREATE UNIQUE NONCLUSTERED INDEX [IdxN] ON [dbo].[TItems]

    (

    [NodeType] ASC,

    [ItemID] ASC,

    [TreeID] ASC,

    [HID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    That's the table.

  • You're 100% correct. How did you know?

    I've stared at these execution plans for a few days, they're so simple and I didn't see it. How can you tell?

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply