growing IN clauses problem

  • I have third party C++ application running on SQL server 2000, SP4 WIN2003 server, which uses widely IN clauses. With the data growth and the different user inout some of these caluses get very long - over 100 elements. There's apperently a limitation in SQl server and the developer knew that so all statements are constructed to run up to 99 In clause elements. If there are more then 99, second statement is issued.

    Well, that's fine, but although the good indexing, when I start seeing these huge select statemetns, the peformance goes way down. especially if multiple users attack the database in the same time with the same queries.

    How can I troubleshoot this situation? I have good indexes, statistics... The usual query takes hundreds of milliseconds...

    Thanks a lot, mj

  • Hi

    You can use a derived table to get rid off the long IN clause.

    Use http://www.sqlservercentral.com/scripts/viewscript.asp?scriptid=850 function as derived table and pass your in clause elements to this function and join with the main query.

    here is the sample code

    SELECT

    EmpName

    FROM

    Emp e

    INNER JOIN dbo.Split('100,101,102,103,104,105',',') d

    ON

    e.EmpID = d.Value

    Hope this works for you...

  • I was going to suggest the same as Nagabhushanam, but I got there second...  Anyway, the suggestion given is without doubt the best way to go.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • insert your data from the IN Clause into temp table ( make sure temp table has an index)

    And then make join between temp table and perm table and your performance will rock.

  • 40,000 In Clause elements in the above example I can get back literally within few seconds...

  • Thanks a lot for the great ideas.

    I need to talk to the application development as it's a 3rd party app and I could only suggest these great ideas. Most likely they will not jump on them... So, I'll be again left with the performance problems...

    Thanks very much,

    mj

  • The problem of a large IN list is not confined to SQL Server.  If I was doing this work in DB2 or Oracle I would not want to see an IN list into the thousands.  The best approach on any platform with an IN list of this size is to put the IN data into some form of table and use a join.  Your application has some bad design if it expects to use very lage In lists, but as this can only be fixed by a vendor-supplied patch you are probably stuck with bad performance for now.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Thanks a lot for the wonderful comments.

    I had a huge fight with the developers of the app, and now they are willing to look and change their code but want me to prove that calling split function or using temp table will be much better solution and it really works faster.

    I was thinking of comparing logical/physical reads and CPU usage for both ways, and of course, execution time for the 2 different ways of selects.

    Give me idea, please, Is it going to be enough just to make a proc to execute in a loop multiple times the same procedure?

    What will be the best way to test/prove that separating huge INs is better?

    Thanks a lot for the help, mj

  • It's hard to fight raw data, your plan of comparing reads and usage should be enough.  If that doesn't make sense to the developers, then they are clearly stating that they do not know how to write efficient queries for SQL Server.  (If you never learned how to measure efficiency, then you probably haven't learned how to actually acheive it.)

    It is worth noting, and adding to your case, that

    WHERE MyTable.MyField IN (value1, value2, value3, value4, ...) ...is just shorthand.  It gets expanded before execution into:

    WHERE MyTable.MyField = value1 OR MyTable.MyField = value2 OR MyTable.MyField = value3 OR MyTable.MyField = value4 OR MyTable.MyField = ..........

    Those tests are run against every row in the table.  SQl Server will short-circuit the OR comparisons (meaning if it made a match on the 3rd item in the list it will stop testing instead of testing the other 96 items in the list), but it's messy even if most of the values are caught by the first 20% of the list.  Non-matching values will still be tested against every item in the IN clause.

    If they are using an IN clause with many values to filter a large table, the amount of overhead can be brutal compared to simply lining two ordered lists up next to each other and plucking out the matches.

    If you want more ammo, ask them how they would write such a feature in thier application (pulling some items from a very large ordered list by matching them to items in a separate ordered list).  Would they would write it to iterate the large list one by one, looping through and testing each value in the smaller list for each value in the large list?  Or would they try to build an efficient search by starting at the top of both lists, proceeding through each of them together looking at both sides (like a merge join), or setting up a b-tree for the smaller list so they can quickly locate a match or no-match in the smaller list for each value in the larger one (like a nested join)?

    If they prefer to iterate like mad, then they should stick to the huge IN lists.  You're not going to get through to them.  If they want an efficient process, then they need to build an indexable table and join it to the source table.

    (The obligitory jerk-2005-comment-in-the-2000-group note: Just between you and me, one of the improvements in SQL Server 2005 is that it will flatten large IN lists and will at times turn them into worktables, as opposed to SQL Server 2000, which will always turn the large IN list into a large, tall tree in the plan.)

    -Eddie

    Eddie Wuerch
    MCM: SQL

  • Thank you so much! I’m very happy to hear that SQL 2005 might create an internal table from the elements in the “In clauses”. While researching this problem I found out that Db2 for example already does that. It should be great to have this in SQL server!

    I would think that when I showed to the developers a better way to design their application, they would be happy to find a way to improve their app. I’m not the only one of their customers who complains with the IN clauses performance. But… let’s hope for the best.

    I’m designing a procedure now which calls all select statements – one with In clauses “AS IS” and the other with the split function. I’ll measure the time, reads, CPU and post my finding – they may help somebody else stuck like me with inefficient code.

    Thanks a lot for the help.

    mj

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply