Looking for ways to speed up complex query with frequent changes to underlying data

  • I've been asked to optimize a query in use in a CRM system.

    The query generates a list of contacts for telesales agents to browse, generated from an underlying master contacts list of approx 750k records. Contacts are assigned to Sources, Sources are assigned to Campaigns. The basic query gets all Contacts assigned to a Campaign. As Contacts can be assigned to multiple Sources, and a Campaign may also have multiple Sources, it's common to have many duplicate contacts for a single campaign.

    There are also several different types of exclusion list which remove contacts based on email address, telephone number, number of previous contacts etc. These are matched against the generated list using WHERE xxx NOT IN clauses.

    A simplified version of the query is here: https://gist.github.com/andybellenie/6498720

    There are also search filters on a lot of the columns, but I've not included those in the gist.

    The requirements are:

    1. Exclusion lists are updated every minute so the results need to be accurate

    2. They need to fetch both an accurate count and first 100 records with pagination with each request

    3. Many agents access the query

    4. There are approx 10 filter fields which apply to columns not included in the gist.

    So, what I'm thinking is to avoid getting the DB to do all of this work on each request, but instead create a separate table to store a de-normalised result set, and simply update that every time one of the underlying tables or exclusion lists changes. I can write triggers to handle all of the updating directly in the db and keep it well away from the application itself.

    I've not used this approach before, so I wanted to get some thoughts from you guys before getting too buried in it.

    Thanks!

  • I glanced at your post on github. There really isn't enough information to provide much in the way of assistance but there are a couple of pieces of your query that may be part of the performance issue.

    Since your subquery with all the UNIONs is simply generating a list of IDs it will a bit quicker to use UNION ALL. It doesn't really matter if you have duplicates and the UNION ALL will not have to filter those duplicates.

    DATEADD(MINUTE, 30, createdAt) > GETUTCDATE()

    This portion is nonSARGable because it has to calculate the result for every single row. Instead you should flip this around a little bit. Your logic is really saying the same as

    createAt > DATEADD(MINUTE, -30, GETUTCDATE())

    I don't think you will be happy if you decide to go with the trigger route. The triggers will mean you have slower inserts, updates and deletes throughout the entire system. In addition triggers for this type of thing can get very unwieldy pretty quickly and maintenance is a lot more challenging.

    We can probably help you boost the performance of you query but we need to have a lot more knowledge of your system and your tables. Please take a look at this article from Gail about what to post for us to help with performance problems.

    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Also, if the data is that volatile you might be looking at statistics issues on the indexes. How are you doing maintaining those?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Just thought, if this query is in a stored procedure, I like to see if using temp tables for tables used multiple times in a query can help speed things up. So I might try that with the leads table which is used 4 times in your exclusions, each time with campaignid = @campaignid. So I might start with a temp table based on leads like:

    Select

    leadid,

    campaignid,

    primaryemail,

    countrycode

    From

    leads

    Where

    campaignId = @campaignid;

    It would definitely be helpful to see a query plan. I wouldn't be surprised to see some bookmark/key lookups that might be avoided by creating covering indexes as well.

  • Thank you for your helpful suggestions. They have helped me optimize my exclusion lists and improved performance.

    However, I think I may have been heading down the wrong path. My main issue appears to be dealing with the sheer number of records when ordering is applied.

    I've updated my gist with the full query including application code: https://gist.github.com/andybellenie/6498720

    As you can see, I am currently limiting to TOP 1000 records, then applying ordering to those. With this approach I am seeing <100ms which is fine. If I limit to TOP 1 record then it's basically instant.

    However, if I add ordering then it can jump up to 2500ms+ event with TOP 1 with a recordset of 60k records which is far from my worst case. Looking at client statistics, almost all of the time is taken up waiting on the server for the first row.

    So, I need to get a total records (which I guess I can do with a separate query) but also add ordered paging. Any suggestions?

  • andybellenie (9/10/2013)


    ...almost all of the time is taken up waiting on the server for the first row...

    Sorting is a blocking operator.

    Can you post the actual execution plan for the query? Save it in SSMS as a .sqlplan file and attach it to your post.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Here we go.

    I've not included the exclusion lists or any filters for now as they don't seem to be having significant impact at this time, and also the left joins and their columns. If this is unhelpful, please let me know.

    Thanks again.

  • andybellenie (9/10/2013)


    Here we go.

    I've not included the exclusion lists or any filters for now as they don't seem to be having significant impact at this time, and also the left joins and their columns. If this is unhelpful, please let me know.

    Thanks again.

    AN execution plan for the real query would be the most helpful. The filters could dramatically change the shape of the query. With the plan you provided I'd be looking at how to get Contacts moved in the plan to try to get rid of the scan and merge join, with the same for Companies.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply