To not RBAR

  • I have a table of taxonomic entries. It is for paleontology, where the entire taxonomic structure is often not known. Someone may find a clearly new type of brachiopod (phylum) and name in honor of his favorite musician Jamesus (genus) hetfieldii (species), but be unable to determine any of the intermediate taxonomic levels (class, order, family), and often the various auxiliary levels (sub-order, super-family) don't even exist. I needed a method to allow any taxonomic level to 'bind upwards' to any level above it. The table structure I created looks like this:

    CREATE TABLE [dbo].[Taxonomy](

    [TaxonAutoID] [int] IDENTITY(1,1) NOT NULL,

    [TaxonName] [varchar](100) NOT NULL,

    [TaxonLevelID] [int] NOT NULL,

    [NextHigherTaxonAutoID] [int] NULL,

    [AltTexts] [varchar](4000) NULL,

    CONSTRAINT [PK_Taxonomy] PRIMARY KEY CLUSTERED …

    where each entry has the identity column of the next higher taxonomic unit (whatever level that may be) as one of its properties. One of the things I need to do with this table is retrieve the entire string of entries, starting at some arbitrary point and proceeding upwards until I reach the top level (kingdom).

    Using RBAR, it's simple. A loop, using the NextHigherTaxonAutoID field of each entry for the search key of the next iteration, until I hit the top level. The absolute maximum possible number of levels is 21, and no entry will have anywhere near that, so it's probably not a huge performance issue, but some queries will ask for many records at once, and this has to be assembled for each record.

    Is there a way to do this assembly using a set-based query? It seems to me that there ought to be a way, using FOR XML, Cross Apply or some such, but my skills in the more esoteric regions of T-SQL are still somewhat lacking.

  • You could use a recursive cte to navigate through your table. This is not the fastest methodology but would be faster than a cursor in most cases. Maybe you could also look into either the HierarchyID datatype (http://msdn.microsoft.com/en-us/library/bb677290.aspx) or even the Nested Sets model.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (3/11/2013)


    You could use a recursive cte to navigate through your table. This is not the fastest methodology but would be faster than a cursor in most cases. Maybe you could also look into either the HierarchyID datatype (http://msdn.microsoft.com/en-us/library/bb677290.aspx) or even the Nested Sets model.

    Thanks, Sean. I considered the Nested Sets model, but I've never built one before, and it looke kind of complicated for experimenting on a functional system (I'm converting from the old model, where each taxonomic unit had its own table, with foreign keys from each table in the main records). The HeirarchyID looks interesting, but I've never even heard of it before. That's going to take some study.

    I guess I'll try the recursive CTE. Since it's a native SQL Server construct, it might be the best bet in performance terms anyway.

    It still seems to me that the XML stuff ought to have some tools for this sort of query (GetAllHigherNodesUpToRoot seems like a fairly natural XML request), but I don't even know how to design the XML storage for such a scheme. More research...

    Appreciate the tips.

  • Assuming Twain is the son of Mark, who is the son of Chase, who is the son of Chevy.. would something like this work for you as a starting point? Just a self referencing table structure that might help.

    create table #taxonomy (id int identity(1,1), parentid int, taxonomyname varchar(10))

    insert into #taxonomy (parentid, taxonomyname)

    values

    (null, 'chevy'),

    (1, 'chase'),

    (2, 'mark'),

    (3, 'twain')

    select tt.taxonomyname + ' - ' + t.taxonomyname + ' : ' from #taxonomy t

    left join #taxonomy tt

    on t.parentid = tt.id

    where tt.taxonomyname is not null

    order by t.id

    for xml path('')

    drop table #taxonomy

  • Yes, Erin, that's the kind of thing I'm looking for. I've used the for xml path('') syntax before, but I don't really understand what it's doing very well. I just copied something off the net that looked similar to my need and fiddled with it until I got the right results. Those were all one-off projects, but this is going into a production system, so I'm going to have to wrap my head around it properly this time.

    Your code produces "chevy - chase : chase - mark : mark - twain : ". I need "twain; mark; chase; chevy", so I'll need to mess with it a bit, but that's the ticket - a single query retrieving everything upstream.

    Many thanks.

  • If you change the select statement to this:

    select t.taxonomyname + ' ; ' from #taxonomy t

    left join #taxonomy tt

    on t.parentid = tt.id

    order by t.id desc

    for xml path('')

    you'll get the desired out put and you can tweak from there.

  • Thanks, Erin, that does indeed produce exactly what I need (except for the trailing separator, of course). I've been experimenting with both the self-join and FOR XML stuff today, and I think I'm getting a fairly good grip on it. One of my first attempts was a bonehead-simple example:

    select 1 union select 2 union select 3 for xml path ('')That got me started in the right direction. I don't understand why SQL doesn't include a catenate function as part of it's native vocabulary, something like (aircode):

    Select ttt.taxonomyname

    from (select tt.taxonomyname + ' - ' + t.taxonomyname + ' : ' from #taxonomy t

    left join #taxonomy tt

    on t.parentid = tt.id

    where tt.taxonomyname is not null

    order by t.id) ttt

    concatenate ttt.taxonomyname

    There's so much call for it on the net and so many workarounds have been built, I'd expect the SQL standards crew would have addressed it by now. Maybe someday...

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply