Using a Variable for an IN Predicate

  • @tfifield

    I apologize for not replying to your question. I was in the middle of rolling out our production SAP implementation at corporate when I posted and, well, let's just say I was a bit distracted.

    I haven't benchmarked the performance but I'll defend my laziness by saying I put the code I write into one of two buckets: a "I run this occasionally to get DBA-related info or for troubleshooting" and "This is executed hundreds/thousands of times a day by end users".

    This code sort of falls into the first bucket. I've used this XML parsing stunt quite a bit (I truly wish I could remember where I saw it first and give thanks to the original author genius that wrote it) and for small sets of parsed data it works famously.

  • Irozenberg 1347 (3/1/2013)


    Elementary sub-query or EXISTS clause would

    Radically improve performance.

    If you like fancy coding you could pass XML as

    parameter and use XQUERY.

    You should read the article about the splitter. It will blow the doors off an XML parser. You can find the article by following the link in my signature about splitting strings. The article is great but the ensuing discussion (although incredibly lengthy) is even better.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Irozenberg 1347 (3/1/2013)


    Elementary sub-query or EXISTS clause would

    Radically improve performance.

    If you like fancy coding you could pass XML as

    parameter and use XQUERY.

    Since you made a claim of performance, let's see your "elementary sub-query or EXISTS" clause.

    I do agree that passing (ugh!) XML would also be fast but it has no respect for the "pipe".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sean Lange (3/1/2013)


    Irozenberg 1347 (3/1/2013)


    Elementary sub-query or EXISTS clause would

    Radically improve performance.

    If you like fancy coding you could pass XML as

    parameter and use XQUERY.

    You should read the article about the splitter. It will blow the doors off an XML parser. You can find the article by following the link in my signature about splitting strings. The article is great but the ensuing discussion (although incredibly lengthy) is even better.

    Be careful now. He's not talking about passing a CSV and using an XML parser to split it. He's talking about passing actual XML and parsing it, which is actually very fast.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/1/2013)


    Sean Lange (3/1/2013)


    Irozenberg 1347 (3/1/2013)


    Elementary sub-query or EXISTS clause would

    Radically improve performance.

    If you like fancy coding you could pass XML as

    parameter and use XQUERY.

    You should read the article about the splitter. It will blow the doors off an XML parser. You can find the article by following the link in my signature about splitting strings. The article is great but the ensuing discussion (although incredibly lengthy) is even better.

    Be careful now. He's not talking about passing a CSV and using an XML parser to split it. He's talking about passing actual XML and parsing it, which is actually very fast.

    Ahh I guess I misread that. :blush:

    It is fast on the sql side of things but I have never really been convinced that applications gain much benefit because of all the extra data that has to be passed.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (3/1/2013)


    It is fast on the sql side of things but I have never really been convinced that applications gain much benefit because of all the extra data that has to be passed.

    +1000 to that!

    Shredding an XML parameter takes a bit fewer resources than parsing a CSV with DelimitedSplit8K and is even a bit faster. An XML parameter also allows you to easily break the 8k barrier. Heh... of course, with XML bloat, you almost have to. 😀

    The other thing that people fail to consider is all the other resources involved. Like the "pipe". It's much more expensive on the pipe to pass XML than it is CSV or TSV. It's also more expensive on the I/O of the server. Being I/O bound is much more difficult to fix than optimizing some code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • A standardized xml type containing the set (think something installed in your schemas collection) would be far easier to use and debug plus SQL server is a stud at using XML. Unless you're running huge inputs regularly for your IN clause, you really don't need to probably micro-optimize

  • Passing XML as the parameter is actually another good use-case for a similar technique, and I've done this where I was dealing with an object-level interface. In that way, the parameter can be passed an entire object worth of data, or for that matter, a set of objects, which can have parent-child relationships encoded within the XML (I used this to pass sets of invoices around between the data layer in the middle tier and the database).

    However, that's not where I have used the technique of comma-delimited strings as parameters; that's more about working with reporting services. I've scratched my head a bit, but I can't think of another case where I've passed sets of values as a delimited string. I have seen it used to express lists of values in a "parameter" table, but I think that's just silliness and not a valid reason to use this technique (yes, we are allowed to have tables with relationships!)

    I will admit that once I am in T-SQL, if I have to pass sets around as parameters, I will do it either within a table or as a table-variable. If I only ever lived in SQL Server (or Oracle), I wouldn't need to be passing sets as delimited strings.

    I wonder what TRON would have looked like if it were based on an RDBMS rather than a game?

    My other thought was that SQL as a language is based around working with sets of data. It shouldn't matter too much about how that data is encoded; whether it's XML, a delimited string, a file, a spread sheet or a table within the database, I still want to be able to apply the same capabilities of SQL to it. From that point of view, taking a comma-delimited string and expressing it as a table is perfectly valid.

  • jimmyzimms (3/1/2013)


    A standardized xml type containing the set (think something installed in your schemas collection) would be far easier to use and debug plus SQL server is a stud at using XML.

    Reporting Services will only automatically generate a comma-delimited string for multi-value parameters; it won't generate XML. So I would have to write a parser in Reporting Services to express the parameters in XML before passing them to T-SQL where I then have to break the XML down again.

    Neither of those is actually difficult, but I don't see the value in spending the extra time every time I call a stored procedure from SQL Server Reporting Services to ensure that any muti-value parameters are encoded in XML.

    In this case, it would make the interface between reporting services and SQL Server harder to use or debug.

  • Bruce W Cassidy (3/1/2013)


    jimmyzimms (3/1/2013)


    A standardized xml type containing the set (think something installed in your schemas collection) would be far easier to use and debug plus SQL server is a stud at using XML.

    Reporting Services will only automatically generate a comma-delimited string for multi-value parameters; it won't generate XML. So I would have to write a parser in Reporting Services to express the parameters in XML before passing them to T-SQL where I then have to break the XML down again.

    Neither of those is actually difficult, but I don't see the value in spending the extra time every time I call a stored procedure from SQL Server Reporting Services to ensure that any muti-value parameters are encoded in XML.

    In this case, it would make the interface between reporting services and SQL Server harder to use or debug.

    Wow that sucks (about SSRS). The point is still valid and I'd actually suggest you in general try to keep to the xml array as the "native" input parameter and have an SSRS wrapper function that deals with the delimited string coercion. This is a fairly common implementation of an anti-corruption layer at work here.

    That's a bit of good trivia there Bruce. thanks!

  • I agree with the UDF approach. We use 2 UDFs, one for splitting character data, one for splitting int. Comes in very handy and only done once.

  • Instead of the CTE, can't you just put the CTE's SELECT statement inside the WHERE clause?

  • jimmyzimms (3/1/2013)


    I'd actually suggest you in general try to keep to the xml array as the "native" input parameter and have an SSRS wrapper function that deals with the delimited string coercion. This is a fairly common implementation of an anti-corruption layer at work here.

    The delimited strings are assembled directly by SSRS; I don't see the value in an "anti-corruption layer" sorry. If SSRS is not building parameters correctly, that means that a significant part of the .Net layer must be corrupt, and no XML parsing is going to fix that (especially since the thing you're most likely to do to build the XML is use the .Net functionality to turn the array of parameters into XML).

    Instead I plug the stored procedure directly into SSRS as a data source, and handle any parameters within the stored procedure (which I would also have to do if it were XML). Whereas if I were passing parameters as XML, I would be doing adding additional work, reducing the abstraction (as you have to go through preparation steps prior to calling a stored procedure, which means you have to know about parameter formats, which is a step down in abstraction), and gaining... what, exactly?

  • rd8202 (3/1/2013)


    Instead of the CTE, can't you just put the CTE's SELECT statement inside the WHERE clause?

    You can do it that way; it's really just a syntactical change. There's a slight difference in the query plan (as the article shows -- it does cover both approaches), but SQL Server ends up doing much the same work. You're exchanging a filter (the IN clause) for a join (the CTE), and both are pretty optimal operations.

    Where there's a difference is where you decide to use a temporary table. I'll do that for two reasons: either I'm expecting more than an handful of values or I'm reusing the list of values in subsequent queries. For the first, I can create a temporary table with a primary key so SQL Server can do an index merge for the join, and that seems to work pretty well.

  • Sean Lange (3/1/2013)


    Jeff Moden (3/1/2013)


    Sean Lange (3/1/2013)


    Irozenberg 1347 (3/1/2013)


    Elementary sub-query or EXISTS clause would

    Radically improve performance.

    If you like fancy coding you could pass XML as

    parameter and use XQUERY.

    You should read the article about the splitter. It will blow the doors off an XML parser. You can find the article by following the link in my signature about splitting strings. The article is great but the ensuing discussion (although incredibly lengthy) is even better.

    Be careful now. He's not talking about passing a CSV and using an XML parser to split it. He's talking about passing actual XML and parsing it, which is actually very fast.

    Ahh I guess I misread that. :blush:

    It is fast on the sql side of things but I have never really been convinced that applications gain much benefit because of all the extra data that has to be passed.

    Thank you Jeff!

    IMHO CSV data in database is a pure hack - some developers just cutting corners to speed up first cut and leave before it causing HUGE maintenance problem, especially with reporting.

    Last year have to deal with data migration project when salesforce.com legacy system had been implemented by storing comma separate values inside of table cells. I used CSV split function based on XML in order to tranform a column that contained comma-separated strings into a new table with ID and atomic items. Pure job creation scheme!

Viewing 15 posts - 31 through 45 (of 47 total)

You must be logged in to reply to this topic. Login to reply