Easy way to search for a string any where within XML column?

  • This is an interesting 5-year old thread...

    Something interesting to note is that the default output for the SQLXML value method is all child text. With that in mind, this query below will strip out all XML tags.

    DECLARE @PLAN_XML XML = N'<BatchSequence>

    <Batch>

    <Statements>

    <SearchString SSID="1">Search text1</SearchString>

    <SearchString SSID="2">Search text2</SearchString>

    <SearchString SSID="3">Search text3</SearchString>

    </Statements>

    </Batch>

    </BatchSequence>';

    -- this will remove all tags

    SELECT @PLAN_XML.value('.', 'varchar(max)');

    Search text1Search text2Search text3

    Anyhow, if the objective was simply to determine if @SEARCH_STRING existed in the text, here's a fun way to return a Boolean value:

    DECLARE @PLAN_XML XML = N'<BatchSequence>

    <Batch>

    <Statements>

    <SearchString SSID="1">Search text</SearchString>

    <SearchString SSID="2">Search text</SearchString>

    <SearchString SSID="3">Search text</SearchString>

    </Statements>

    </Batch>

    </BatchSequence>';

    DECLARE @SEARCH_STRING NVARCHAR(MAX) = N'Search text';

    SELECT SIGN(CHARINDEX(@SEARCH_STRING, CAST(@PLAN_XML AS varchar(max))))

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Tommy 70494 (11/2/2016)


    Thank you!

    it works, right out of the box.

    I don't know why I didn't think of this.

    I guess I got caught up in the fact that it was XML.

    The varchar(max) is able to hold the entire contents of the SSIS xml?

    ----------------------------------------------------

  • Edit2: OOps my understanding of it was incorrect, indeed it convers XML to nvarchar, and that is not large enough to contain the entire XML string.

    SO, back to the drawing board.

    Any ideas?

    EDIT: the version above, searches the entire XML, so it works as intended actually. I just tested 4 of our larger packages, and the entire XML is in the variable that is searched.

    [Strike]

    varchar(max) can't for the test packages i've used.

    But I was hoping to get around that by doings something to the XML.

    So in short, the logic works, but atleast for my test package, which is rather small it can only contain about 40% of the package definition.

    Even without the XML.

    :/

    Any other ideas?

  • Why not just properly shred the XML and store the results in proper tables? Makes lookups real easy. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This may not sound elegant, but a two step approach is to query out the entire xml string with the Results To File setting selected. Save the results to a .txt file from there you can open with your favorite editor to do a search.

    If you are going to do that with SSIS, I would think that you would be better off just opening the solution with Visual Studio. From there navigate to View > Code. In the resulting window do your search.

    ----------------------------------------------------

  • Jeff Moden (11/3/2016)


    Why not just properly shred the XML and store the results in proper tables? Makes lookups real easy. 😉

    Id love too, can you point me in the right direction?

    I've tried a few things, im still running a 2008r2, this is part of migrating to 2016, but i have just under 5000 packages in production, and id really like to clean up before migrating.

    Hence the whole idea of figuring out which packages use which tables, so i can clean all non-used tables, stores procedures and views.

    MMartin1 (11/3/2016)


    This may not sound elegant, but a two step approach is to query out the entire xml string with the Results To File setting selected. Save the results to a .txt file from there you can open with your favorite editor to do a search.

    If you are going to do that with SSIS, I would think that you would be better off just opening the solution with Visual Studio. From there navigate to View > Code. In the resulting window do your search.

    While a good solution, its not practical with my current load of packages, i have just under 5000 packages in production, and id really like an automated solution to also be able to run every once in a while, to ensure I dont have orphaned views, tables etc.

    I plan on modifying it to also run on views, so any table that is not used in a view, package, sp or is queried on a regular basis, is put on a watch list and retired after a period of time.

    But so far, I've come up short with how to shred the XML correctly, so i can search the package definition of my list of tables. Of which i have about 4500, with another 6000 views on top.

  • Jeff Moden (11/3/2016)


    Why not just properly shred the XML and store the results in proper tables? Makes lookups real easy. 😉

    I guess the main problem with shredding the XML, as far as I can figure out, im no expert on this XML or shredding, is that the string I am looking for, doesnt come up as a Tag, but as inline text in, what I guess is as tag.

    Also, I don't have an exhaustive list of tags to look for.

    This is one example where it is a script task.

    Obviously, theres lots more code her,e but this is the essential part.

    [Code]

    <DTS:ObjectData>

    <SQLTask:SqlTaskData xmlns:SQLTask="www.microsoft.com/sqlserver/dts/tasks/sqltask" SQLTask:Connection="{00-00-00-00}" SQLTask:TimeOut="0" SQLTask:IsStoredProc="False" SQLTask:BypassPrepare="True" SQLTask:SqlStmtSourceType="DirectInput" SQLTask:SqlStatementSource="TRUNCATE TABLE IMPORT.xxx.xxx_FORECAST_DATA;" SQLTask:CodePage="1252" SQLTask:ResultType="ResultSetType_None" />

    </DTS:ObjectData>

    [/code]

    So, here im lookig at finding the xxx_FORECAST_DATA, where I have the name from

    SELECT TABLE_NAME,TABLE_TYPE FROM information_schema.tables

    So, the general idea, is to search through each XML in

    [Code]

    SELECT name, CONVERT(XML, CONVERT(VARBINARY(MAX), packagedata)) AS ssispackagexml FROM msdb.dbo.sysssispackages WHERE name

    [/code]

    Using the above table, and then for each hit from the tablelist, output tablename, so I can see it is used in the package.

  • I didn't realize that you were trying to shred the XML of packages... certainly not so many of them. I'm no Ninja at XML either and apologize for my ineptitude in this area.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Tommy 70494 (11/3/2016)


    But so far, I've come up short with how to shred the XML correctly, so i can search the package definition of my list of tables. Of which i have about 4500, with another 6000 views on top.

    Shredding the SSIS package files is relatively straight forward and probably a better option than a string search in this case, especially if the package XML has already been loaded into a table.

    😎

    Which part of the shredding are you struggling with?

    Are the packages similar in structure?

    Are the packages simple or complex?

    What are the LastModifiedProductVersion and ExecutableType property values? (2008 is 11.xxx and SSIS.Package.2 respectfully)

    Edit: Typo

  • Here is some SQL I've used to search SSIS packages to see if a field or table is used in the package. This seems to work for the size of packages we have, maybe extremely large packages wouldn't work. Hope this helps.

    SELECT [name] AS SSISPackageName,

    CONVERT(XML, CONVERT(VARBINARY(MAX), packagedata)) AS SSISPackageXML,

    CONVERT(VARCHAR(MAX), CONVERT(VARBINARY(MAX), packagedata)) AS SSISPackageVarchar

    INTO #SSISObjectSearch

    FROM msdb.dbo.sysssispackages

    SELECT *

    FROM #SSISObjectSearch

    WHERE SSISPackageVarchar LIKE '%search_string_here%'

    ORDER BY SSISPackageName

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • At my prior job we had the SSIS packages stored on network drives. I was able to use the tool Ultra Edit to search across these folders for specific strings. If you could move the .dtsx files to a folder(s) then you could use something to search that way. I liked Ultra Edit since I could limit it to look at a specific type of file, like all *.dtsx or *.sql.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply