Easy way to search for a string any where within XML column?

  • Is there a really easy way to search for a string in an XML column? All the examples I come across seem to require shredding the XML to some extent, such as specifying whether looking within an attribute vs an element, needing to specify nodes , etc..

    For example, if I need to determine if a given string exists anywhere within an XML column, I'd like the query to return all the relevant rows -- and it's ok if the XML field is still returned as XML.

    Thanks in advance,

    --Pete

  • By search for a string, do you mean search for a word within the XML, or search for a substring within the XML?

    If words, then Full Text Indexing can work on XML columns in tables. http://msdn.microsoft.com/en-us/library/bb522491.aspx

    If substrings, then that's more complex.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • HI, i have been working on a task that sounds similar to what you suggest, i will add my code in here on what i did when i get to work tomorrow however, in summery i used a substring function and run it through a while loop to start at the beginning of a string look through and find the first occurance of what i was lookingng for and set a length based of another parameter. after finding the first one it then continues to parse the XML string and find all the other values. My result set ends with a table containing rows of data 1 for each occorancece.

    ***The first step is always the hardest *******

  • I'm not sure if I understood correctly, but it seems like you're looking for the "exist" method to query the xml data.

    Here's a short sample:

    DECLARE @tbl TABLE

    (

    id INT,

    col XML

    )

    INSERT INTO @tbl

    VALUES

    (1,'<name><name1>5</name1></name>'),(2,'<name>6</name>')

    DECLARE @id int

    SELECT @id = 6

    SELECT *

    FROM @tbl t1

    WHERE t1.col.exist('//*[text()=sql:variable("@id")]')=1



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • And here's the query to search for a substring:

    SELECT *

    FROM @tbl t1

    WHERE col.exist('//*/text()[contains(.,sql:variable("@id"))]') = 1

    Depending on the size of the xml data this may take a while since -at least to my knowledge- "contains" won't benefit from an xml index that might exists.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • as promised here is what i did

    declare @start int,

    @end int,

    @st int

    set @start =1

    set @st=0

    while @start !=0

    begin

    set @start=(select cast (CHARINDEX('<baseshape ',xml_data,@st) as varchar)

    from #xml_table)

    set @end=(select cast (CHARINDEX('</BaseShape>',xml_data,@start)-@start+12 as varchar)

    from #xml_table)

    insert into #xml select SUBSTRING (xml_data,@start,@end)

    from #xml_table

    set @st=@start+10

    end

    ***The first step is always the hardest *******

  • To:

    GSquared

    AccidentalDBA:)

    LutzM

    Thanks for all of your replies. I plan on testing them out for better understanding. Ultimately, I'll need to grow my knowledge of XML (which currently stands at near-zero). At the very least, I've come across plenty of articles on the fundamentals of XML. But for the immediate moment, your suggestions should get me to the next point for now.

    Thanks again,

    Pete

  • You're welcome.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I think Text Finding may meet your needs. Using this wonderful tool is a good and easy way to search text in all types of files. It is definitely worth a try and it doesnt hurt when you can try their free version too. They provide outstanding support!!

  • I don't understand this.

    Using the above example, I changed it to the following:

    [Code]

    DECLARE @tbl TABLE

    (

    id INT,

    col XML

    )

    INSERT INTO @tbl

    VALUES

    (1,'<name><name1>aht</name1></name>'),(2,'<name>6</name>')

    DECLARE @id VARCHAR(10)

    SELECT @id = 'aht'

    SELECT *

    FROM @tbl t1

    WHERE t1.col.exist('//*[text()=sql:variable("@id")]')=1

    [/code]

    Which works, but when I do the following, it doesnt.

    [Code]

    use MSDB

    GO

    --create a temp table to hold xml for SSIS packages

    Create Table #temp (name sysname, ssispackagexml XML)

    insert into #temp (name, ssispackagexml)

    SELECT [name] AS SSISPackageName

    , CONVERT(XML, CONVERT(VARBINARY(MAX), packagedata)) AS SSISPackageXML

    FROM msdb.dbo.sysssispackages

    WHERE name = 'AHT'

    ;WITH cte AS (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) Num ,name

    ,ssispackagexml

    FROM

    #temp

    )

    SELECT * INTO #t2 FROM cte WHERE num = 1

    DECLARE @id VARCHAR(10)

    SELECT @id = 'aht'

    SELECT *

    FROM #t2

    WHERE #t2.ssispackagexml.exist('//*[text()=sql:variable("@id")]')=1

    [/code]

    Now I realize, you can't test this, because I didnt include example data, however it works when I use then <name>value</name> syntax, but anything I pull out of MSDB.dbo.ssispackagexml, doesnt work.

    Any ideas?

  • If your objective is to check whether the XML column contains text without caring where in the XML the text is, why not convert the XML column to a varchar and use LIKE? The below example populates a table @t with some random XML, then uses a plain SELECT to find the XML rows containing 'a':

    declare @t table (xmlCol xml);

    insert into @t values ('<thing>Apple</thing>');

    insert into @t values ('<thing>Banana<thing2 type="fruit"/></thing>');

    select * from @t where Convert(varchar(max),xmlCol) like '%a%';

    This should be OK unless you have a massive amount of data.

  • Does this help ?

    DECLARE @tbl TABLE

    (

    id INT,

    col XML

    )

    INSERT INTO @tbl

    VALUES

    (1,'<name><name1>aht</name1></name>'),(2,'<name>6</name>')

    select *

    from @tbl

    where col.value('(/name)[1]','varchar(10)')='aht'

    ----------------------------------------------------

  • Hi

    thanks for both your replies.

    @mmartin1 I figured out why that doesnt work, its because my tablename is not a tag/component, but part of a string that is located somewhere in the XML.

    Im not concerned where it is located, just if it exist in the xml string.

    @william Rayer

    Problem with casting as varchar, is as far as I can see, it truncates the XML because varchar(max) is not large enough to contain the data.

    This is an example directly from the XML.

    [Code]

    <SQLTask:SqlTaskData xmlns:SQLTask="www.microsoft.com/sqlserver/dts/tasks/sqltask" SQLTask:Connection="{00-00-00-00}" SQLTask:TimeOut="0" SQLTask:IsStoredProc="False" SQLTask:BypassPrepare="True" SQLTask:SqlStmtSourceType="DirectInput" SQLTask:SqlStatementSource="SELECT CONVERT(varchar(25),max([Dato]),120) AS MAXDATO FROM [DB1].[dbo].[AHT] WHERE Org <> ''NSC'' and source = ''OS1''" SQLTask:CodePage="1252" SQLTask:ResultType="ResultSetType_SingleRow"></SQLTask>

    [/Code]

    Using the above, i cant seem to find the AHT table. because its part of the SQLstatement source.

    This is just one example, there are many others, where tablenames are located in SSIS variables, or OLE DB connections or ADO.NET connectionstring etc.

  • Locating a string pattern within a string is straight forward, regardless of whether the string is an XML or not, here is a quick example of a search in an XML string.

    😎

    USE TEEST;

    GO

    SET NOCOUNT ON;

    ---

    DECLARE @PLAN_XML XML = N'<BatchSequence>

    <Batch>

    <Statements>

    <SearchString SSID="1">Search text</SearchString>

    <SearchString SSID="2">Search text</SearchString>

    <SearchString SSID="3">Search text</SearchString>

    </Statements>

    </Batch>

    </BatchSequence>';

    DECLARE @SEARCH_STRING NVARCHAR(MAX) = N'Search text';

    SELECT

    (

    DATALENGTH(CONVERT(NVARCHAR(MAX),@PLAN_XML,0))

    - DATALENGTH(REPLACE(CONVERT(NVARCHAR(MAX),@PLAN_XML,0),@SEARCH_STRING,N''))

    ) / DATALENGTH(@SEARCH_STRING) AS INSTANCE_COUNT

    ,CHARINDEX(@SEARCH_STRING,CONVERT(NVARCHAR(MAX),@PLAN_XML,0),1) AS FIRST_POS_CHARINDEX

    ,PATINDEX(NCHAR(37) + @SEARCH_STRING + NCHAR(37),CONVERT(NVARCHAR(MAX),@PLAN_XML,0)) AS FIRST_POS_PATINDEX

    ;

    Output

    INSTANCE_COUNT FIRST_POS_CHARINDEX FIRST_POS_PATINDEX

    -------------------- -------------------- --------------------

    3 58 58

  • Thank you!

    it works, right out of the box.

    I don't know why I didn't think of this.

    I guess I got caught up in the fact that it was XML.

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply