• bryan 83518 (1/15/2014)


    Trying to work out how to query xml that looks like this I will loop through all the artist but only want the Main Artist not the Secondary

    <Artists>

    <ArtistText>Genie Zhuo</ArtistText>

    <Artist Type="Main">

    <FullName>Genie Zhuo</FullName>

    </Artist>

    <Artist Type="Secondary">

    <FullName>The Second Man</FullName>

    </Artist>

    </Artists>

    <Artists>

    <ArtistText>The Man</ArtistText>

    <Artist Type="Main">

    <FullName>The Man</FullName>

    </Artist>

    <Artist Type="Secondary">

    <FullName>The Second Man</FullName>

    </Artist>

    </Artists>

    This query works but if I change the [1] to 2 or 4 I get the secondary artist

    SELECT

    XMLContents.value('(/Product/Artists/Artist/FullName)[1]', 'nvarchar(max)') as RecordType

    FROM [XMLFile]

    where XMLFileId = 1

    Richard showed how you could pull the data from a SQL variable. If you had your data in a table you could query the XML like this:

    SELECT a.value('(FullName)[1]', 'nvarchar(max)') as RecordType

    FROM #sampleXML sx

    CROSS APPLY x.nodes('Artists/Artist[@Type="Main"]') y(a)

    A couple performance notes:

    1st if you are trying to return text (e.g. <xxx>this is text</xxx>) then you want to specify a text node (which I will demonstrate in a moment.)

    2nd, it does not appear that you would need nvarchar(max). (max) data types can be killers, for this you should consider something smaller (in my example I used nvarchar(1000).

    Here's a test harness:

    IF OBJECT_ID('tempdb..#sampleXML') IS NOT NULL DROP TABLE #sampleXML;

    CREATE TABLE #sampleXML (xid int primary key, x xml not null)

    DECLARE @x varchar(4000)='

    <Artists>

    <ArtistText>#5</ArtistText>

    <Artist Type="Main">

    <FullName>Artist #1</FullName>

    </Artist>

    <Artist Type="Secondary">

    <FullName>Artist #2</FullName>

    </Artist>

    </Artists>

    <Artists>

    <ArtistText>#6</ArtistText>

    <Artist Type="Main">

    <FullName>Artist #3</FullName>

    </Artist>

    <Artist Type="Secondary">

    <FullName>Artist #4</FullName>

    </Artist>

    </Artists>';

    DECLARE @rows int=1000;

    WITH tally AS

    (

    SELECT TOP (@rows)

    ROW_NUMBER() OVER (ORDER BY (SELECT ($))) AS n

    FROM sys.all_columns a CROSS JOIN sys.all_columns b

    )

    INSERT #sampleXML

    SELECT n,

    CONVERT(xml,

    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@x,

    '#1','#'+CONVERT(varchar(1000),floor(900000*rand(convert(varbinary, newid())))+100000)),

    '#2','#'+CONVERT(varchar(1000),floor(900000*rand(convert(varbinary, newid())))+100000)),

    '#3','#'+CONVERT(varchar(1000),floor(900000*rand(convert(varbinary, newid())))+100000)),

    '#4','#'+CONVERT(varchar(1000),floor(900000*rand(convert(varbinary, newid())))+100000)),

    '#5','random text:'+CONVERT(varchar(1000),floor(900000*rand(convert(varbinary, newid())))+100000)),

    '#6','random text:'+CONVERT(varchar(1000),floor(900000*rand(convert(varbinary, newid())))+100000)))

    FROM tally

    Here's three versions of the same query using the performance improvements I mentioned along with the results:

    -- not using text() node

    SELECT a.value('(FullName)[1]', 'nvarchar(max)') as RecordType

    FROM #sampleXML sx

    CROSS APPLY x.nodes('Artists/Artist[@Type="Main"]') y(a)

    -- using text() node

    SELECT a.value('(FullName/text())[1]', 'nvarchar(max)') as RecordType

    FROM #sampleXML sx

    CROSS APPLY x.nodes('Artists/Artist[@Type="Main"]') y(a)

    -- changing nvarchar(max) to nvarchar(1000)

    SELECT a.value('(FullName/text())[1]', 'nvarchar(1000)') as RecordType

    FROM #sampleXML sx

    CROSS APPLY x.nodes('Artists/Artist[@Type="Main"]') y(a)

    Results:

    1,000 rows:

    -- without text() node, nvarchar(max):

    SQL Server Execution Times:

    CPU time = 359 ms, elapsed time = 758 ms.

    -- with text() node, nvarchar(max):

    SQL Server Execution Times:

    CPU time = 374 ms, elapsed time = 601 ms.

    --change nvarchar(max) to nvarchar(1000):

    SQL Server Execution Times:

    CPU time = 203 ms, elapsed time = 532 ms.

    10,000 rows:

    -- without text() node, nvarchar(max):

    SQL Server Execution Times:

    CPU time = 3260 ms, elapsed time = 7420 ms.

    -- with text() node, nvarchar(max):

    SQL Server Execution Times:

    CPU time = 2403 ms, elapsed time = 5113 ms.

    -- with text() node, nvarchar(1000):

    SQL Server Execution Times:

    CPU time = 1419 ms, elapsed time = 3209 ms.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001