Easy way to search for a string any where within XML column?

  • By search for a string, do you mean search for a word within the XML, or search for a substring within the XML?

    If words, then Full Text Indexing can work on XML columns in tables. http://msdn.microsoft.com/en-us/library/bb522491.aspx

    If substrings, then that's more complex.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • HI, i have been working on a task that sounds similar to what you suggest, i will add my code in here on what i did when i get to work tomorrow however, in summery i used a substring function and run it through a while loop to start at the beginning of a string look through and find the first occurance of what i was lookingng for and set a length based of another parameter. after finding the first one it then continues to parse the XML string and find all the other values. My result set ends with a table containing rows of data 1 for each occorancece.

    ***The first step is always the hardest *******

  • I'm not sure if I understood correctly, but it seems like you're looking for the "exist" method to query the xml data.

    Here's a short sample:

    DECLARE @tbl TABLE

    (

    id INT,

    col XML

    )

    INSERT INTO @tbl

    VALUES

    (1,'<name><name1>5</name1></name>'),(2,'<name>6</name>')

    DECLARE @id int

    SELECT @id = 6

    SELECT *

    FROM @tbl t1

    WHERE t1.col.exist('//*[text()=sql:variable("@id")]')=1



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • And here's the query to search for a substring:

    SELECT *

    FROM @tbl t1

    WHERE col.exist('//*/text()[contains(.,sql:variable("@id"))]') = 1

    Depending on the size of the xml data this may take a while since -at least to my knowledge- "contains" won't benefit from an xml index that might exists.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • as promised here is what i did

    declare @start int,

    @end int,

    @st int

    set @start =1

    set @st=0

    while @start !=0

    begin

    set @start=(select cast (CHARINDEX('<baseshape ',xml_data,@st) as varchar)

    from #xml_table)

    set @end=(select cast (CHARINDEX('</BaseShape>',xml_data,@start)-@start+12 as varchar)

    from #xml_table)

    insert into #xml select SUBSTRING (xml_data,@start,@end)

    from #xml_table

    set @st=@start+10

    end

    ***The first step is always the hardest *******

  • To:

    GSquared

    AccidentalDBA:)

    LutzM

    Thanks for all of your replies. I plan on testing them out for better understanding. Ultimately, I'll need to grow my knowledge of XML (which currently stands at near-zero). At the very least, I've come across plenty of articles on the fundamentals of XML. But for the immediate moment, your suggestions should get me to the next point for now.

    Thanks again,

    Pete

  • You're welcome.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I think Text Finding may meet your needs. Using this wonderful tool is a good and easy way to search text in all types of files. It is definitely worth a try and it doesnt hurt when you can try their free version too. They provide outstanding support!!

  • I don't understand this.

    Using the above example, I changed it to the following:

    [Code]

    DECLARE @tbl TABLE

    (

    id INT,

    col XML

    )

    INSERT INTO @tbl

    VALUES

    (1,'<name><name1>aht</name1></name>'),(2,'<name>6</name>')

    DECLARE @id VARCHAR(10)

    SELECT @id = 'aht'

    SELECT *

    FROM @tbl t1

    WHERE t1.col.exist('//*[text()=sql:variable("@id")]')=1

    [/code]

    Which works, but when I do the following, it doesnt.

    [Code]

    use MSDB

    GO

    --create a temp table to hold xml for SSIS packages

    Create Table #temp (name sysname, ssispackagexml XML)

    insert into #temp (name, ssispackagexml)

    SELECT [name] AS SSISPackageName

    , CONVERT(XML, CONVERT(VARBINARY(MAX), packagedata)) AS SSISPackageXML

    FROM msdb.dbo.sysssispackages

    WHERE name = 'AHT'

    ;WITH cte AS (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) Num ,name

    ,ssispackagexml

    FROM

    #temp

    )

    SELECT * INTO #t2 FROM cte WHERE num = 1

    DECLARE @id VARCHAR(10)

    SELECT @id = 'aht'

    SELECT *

    FROM #t2

    WHERE #t2.ssispackagexml.exist('//*[text()=sql:variable("@id")]')=1

    [/code]

    Now I realize, you can't test this, because I didnt include example data, however it works when I use then <name>value</name> syntax, but anything I pull out of MSDB.dbo.ssispackagexml, doesnt work.

    Any ideas?

  • If your objective is to check whether the XML column contains text without caring where in the XML the text is, why not convert the XML column to a varchar and use LIKE? The below example populates a table @t with some random XML, then uses a plain SELECT to find the XML rows containing 'a':

    declare @t table (xmlCol xml);

    insert into @t values ('<thing>Apple</thing>');

    insert into @t values ('<thing>Banana<thing2 type="fruit"/></thing>');

    select * from @t where Convert(varchar(max),xmlCol) like '%a%';

    This should be OK unless you have a massive amount of data.

  • Does this help ?

    DECLARE @tbl TABLE

    (

    id INT,

    col XML

    )

    INSERT INTO @tbl

    VALUES

    (1,'<name><name1>aht</name1></name>'),(2,'<name>6</name>')

    select *

    from @tbl

    where col.value('(/name)[1]','varchar(10)')='aht'

    ----------------------------------------------------

  • Hi

    thanks for both your replies.

    @MMartin1 I figured out why that doesnt work, its because my tablename is not a tag/component, but part of a string that is located somewhere in the XML.

    Im not concerned where it is located, just if it exist in the xml string.

    @william Rayer

    Problem with casting as varchar, is as far as I can see, it truncates the XML because varchar(max) is not large enough to contain the data.

    This is an example directly from the XML.

    [Code]

    <SQLTask:SqlTaskData xmlns:SQLTask="www.microsoft.com/sqlserver/dts/tasks/sqltask" SQLTask:Connection="{00-00-00-00}" SQLTask:TimeOut="0" SQLTask:IsStoredProc="False" SQLTask:BypassPrepare="True" SQLTask:SqlStmtSourceType="DirectInput" SQLTask:SqlStatementSource="SELECT CONVERT(varchar(25),max([Dato]),120) AS MAXDATO FROM [DB1].[dbo].[AHT] WHERE Org <> ''NSC'' and source = ''OS1''" SQLTask:CodePage="1252" SQLTask:ResultType="ResultSetType_SingleRow"></SQLTask>

    [/Code]

    Using the above, i cant seem to find the AHT table. because its part of the SQLstatement source.

    This is just one example, there are many others, where tablenames are located in SSIS variables, or OLE DB connections or ADO.NET connectionstring etc.

  • Locating a string pattern within a string is straight forward, regardless of whether the string is an XML or not, here is a quick example of a search in an XML string.

    😎

    USE TEEST;

    GO

    SET NOCOUNT ON;

    ---

    DECLARE @PLAN_XML XML = N'<BatchSequence>

    <Batch>

    <Statements>

    <SearchString SSID="1">Search text</SearchString>

    <SearchString SSID="2">Search text</SearchString>

    <SearchString SSID="3">Search text</SearchString>

    </Statements>

    </Batch>

    </BatchSequence>';

    DECLARE @SEARCH_STRING NVARCHAR(MAX) = N'Search text';

    SELECT

    (

    DATALENGTH(CONVERT(NVARCHAR(MAX),@PLAN_XML,0))

    - DATALENGTH(REPLACE(CONVERT(NVARCHAR(MAX),@PLAN_XML,0),@SEARCH_STRING,N''))

    ) / DATALENGTH(@SEARCH_STRING) AS INSTANCE_COUNT

    ,CHARINDEX(@SEARCH_STRING,CONVERT(NVARCHAR(MAX),@PLAN_XML,0),1) AS FIRST_POS_CHARINDEX

    ,PATINDEX(NCHAR(37) + @SEARCH_STRING + NCHAR(37),CONVERT(NVARCHAR(MAX),@PLAN_XML,0)) AS FIRST_POS_PATINDEX

    ;

    Output

    INSTANCE_COUNT FIRST_POS_CHARINDEX FIRST_POS_PATINDEX

    -------------------- -------------------- --------------------

    3 58 58

  • Thank you!

    it works, right out of the box.

    I don't know why I didn't think of this.

    I guess I got caught up in the fact that it was XML.

  • This is an interesting 5-year old thread...

    Something interesting to note is that the default output for the SQLXML value method is all child text. With that in mind, this query below will strip out all XML tags.

    DECLARE @PLAN_XML XML = N'<BatchSequence>

    <Batch>

    <Statements>

    <SearchString SSID="1">Search text1</SearchString>

    <SearchString SSID="2">Search text2</SearchString>

    <SearchString SSID="3">Search text3</SearchString>

    </Statements>

    </Batch>

    </BatchSequence>';

    -- this will remove all tags

    SELECT @PLAN_XML.value('.', 'varchar(max)');

    Search text1Search text2Search text3

    Anyhow, if the objective was simply to determine if @SEARCH_STRING existed in the text, here's a fun way to return a Boolean value:

    DECLARE @PLAN_XML XML = N'<BatchSequence>

    <Batch>

    <Statements>

    <SearchString SSID="1">Search text</SearchString>

    <SearchString SSID="2">Search text</SearchString>

    <SearchString SSID="3">Search text</SearchString>

    </Statements>

    </Batch>

    </BatchSequence>';

    DECLARE @SEARCH_STRING NVARCHAR(MAX) = N'Search text';

    SELECT SIGN(CHARINDEX(@SEARCH_STRING, CAST(@PLAN_XML AS varchar(max))))

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply