Need help, better approach, or ideas for improving performance of query.

  • Goal: I need to find a more efficient way to perform the query provided below.

    Requirements: A new table is being added Customer_Email_Property. This table needs to have in it an email entry for each cutomer per property. If there are 20 properties than each email address will be in the table 20x. This will run as part of an SSIS package approx. every 2 minutes. Also, if an EmailStatus is updated to 5 in the CUSTOMERS table then the EmailStatus in the new table will be updated to 5 for all rows for that email.

    Info: There are 2 tables we need to look at to check for email addresses. Table1 – CUSTOMERS. This table has several columns, but we only are concerned about Email, EmailStatus. Table2 – CUSTOMERS_EMAIL. Again for the purposes of this solution we only need Email, EmailStatus. There is a 3rd table, PROPERTY, here we can see all the PropertyIDs. The email and email status will be put into this table(if not already existing) for each Property. It could be just a matter of purely the size of the new table, but I want to make sure there is no better option before I concede. For example’s sake, on the dev server there are ~700,000 emails x 26 Properties coming out to about 18.5mil rows. I have tried playing around with merge, but I do not believe in this case it is the right solution. I’ll show why under IO Merge analysis.

    Other things I’ve considered: I wondered, though have yet to try and test, if a better solution would be to reverse the table structure of the new table. Meaning only 1 email address in the table per unique email and instead a column for each Property to contain the EmailStatus of that property(because in the end this is what is prompting this new requirement). This would need to be a significant performance gain however, because the forces that be are already expecting the one table structure, so It would need to be significant to propose an entirely different structure for this table. Just looking at the underlined part of the query, the problem lies there. I am looking into alternatives for EXISTS

    What cannot change:

    1.The emails must be checked against both the CUSTOMERS and CUSTOMERS_EMAIL table. It is possible for one table to have an email address that the other does not. I realize this is not optimal, but it is not in my hands right now.

    2.Must be built into the SSIS package

    1 – Note: For the initial transfer of data, this query did not work because it ran out of memory trying to do the entire insert in one go. I got around this by doing it a property at a time within a loop.

    INSERT INTO dbo.D_Customer_Email_Property

    ( PropertyID, Email, EmailStatus)

    SELECT x.propertycode, Email, EmailStatus FROM(

    SELECT d.email, EmailStatus

    FROM CUSTOMERS d

    WHERE NOT EXISTS (

    SELECT ep.email FROM Customer_Email_Property ep WHERE ep.email = d.email) AND d.EmailStatus NOT IN (2,4)

    UNION

    SELECT e.email, EmailStatus

    FROM CUSTOMERS_EMAIL e

    WHERE e.EmailStatus NOT IN (2,4)

    AND NOT EXISTS (

    SELECT DISTINCT(ep.email)

    FROM Customer_Email_Property ep

    WHERE ep.Email = e.Email) )iq

    CROSS APPLY (SELECT Propertycode FROM PROPERTY) x

    Io/time stats:

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server parse and compile time:

    CPU time = 219 ms, elapsed time = 227 ms.

    (0 row(s) affected)

    Table 'CUSTOMERS'. Scan count 7, logical reads 3387, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'CUSTOMER_EMAIL_PROPERTY'. Scan count 10, logical reads 215614[/u], physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table ‘CUSTOMERS_EMAIL'. Scan count 5, logical reads 3275, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 25039 ms, elapsed time = 7655 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Update part of the solution:

    Note: This query performs fine and does not need to be looked at. I just have it here for completeness

    UPDATE ep

    SET ep.emailstatus = 5

    from Customer_Email_Property ep

    INNER JOIN CUSTOMERS d

    ON d.Email = ep.Email

    WHERE d.emailstatus = 5 AND ep.emailstatus <> 5

    IO Merge analysis:

    When I just query the part of the TSQL in orange here are the IO results:

    Table 'CUSTOMERS'. Scan count 7, logical reads 3387, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table ‘CUSTOMER_EMAIL_PROPERTY'. Scan count 5, logical reads 107567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Vs.

    MERGE Customer_Email_Property as ep

    using CUSTOMERS AS d

    ON (d.Email = ep.email)

    WHEN MATCHED AND d.EmailStatus = 5

    THEN UPDATE SET ep.emailstatus = 5

    WHEN NOT MATCHED BY TARGET AND d.emailstatus NOT IN (2,4)

    THEN INSERT (Email,EmailStatus) VALUES(d.email,d.emailstatus);

    IO stats:

    Table 'CUSTOMERS'. Scan count 5, logical reads 3420, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table ‘CUSTOMER_EMAIL_PROPERTY'. Scan count 5, logical reads 145941, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply