Comparing 2 large tables row by row and date comparision

  • We have a functionality where in we need to process a large file containing around 1.5 million rows.

    We insert the data from the file in a secondary table and once this insert is complete, we compare it with our parent table and only insert rows which have changed.

    The parent table contains around 2.5 million rows.

    We do this using a correlated query and NOT EXISTS and the complete process takes around 4-5 seconds.

    Lately, a new column by the name EffectiveDate was added in the file as well as parent table and had to implement a change where in say for example a record exists in the parent table with an EffectiveDate of 06/13/2020 and while processing the file if there is a same record with an EffectiveDate of 05/01/2020 then, the record need to be inserted in the parent table.

    To achieve this, I have made a small changes by adding greater than equal '>' to comapre the EffectiveDate column in both tables.

    On my local, the whole process runs for ever. It was running for around 2.3 hours after which I have to cancel it.

    If I remove the '>' then the query runs as usual in 4-5 seconds.

    Not able to understand how by just adding '>' in the date comparision is causing the query to run for ever.

    Both the table have an index on the EffectiveDate column.

    Not able to obtain the Execution Plan as the query runs for ever.

    Any pointers from the experts will be helpful.

    Is there any other approach other than correlated query and NOT EXISTS that can be used to achieve this?

    INSERT INTO dbo.ParentTable....
    SELECT
    ST.FirstName
    ,ST.LastName
    ,ST.BirthDate
    ,ST.StreetAddress
    ,ST.City
    ,ST.State
    ,ST.PostalCode
    ,ST.EffectiveDate
    FROM
    dbo.SecondaryTable ST
    WHERE
    NOT EXISTS
    (               
    SELECT
    PT.Id
    FROM
    dbo.ParentTable PT
    WHERE
    ST.EffectiveDate>=PT.EffectiveDate
    AND ST.FirstName=PT.FirstName
    AND ST.LastName=PT.LastName
    AND ST.BirthDate=PT.BirthDate
    AND ST.StreetAddress=PT.StreetAddress
    AND ST.City=PT.City
    AND ST.PostalCode=PT.PostalCode)

    • This topic was modified 3 years, 10 months ago by  jignesh209.
  • If EffectiveDate indexed? What does the new execution plan look like for the updated query?

  • what indexes exist on the parent table?

    most likely there is one already covering all the columns on that where clause - and if so the effective date should most likely need to be added at the end of the column list.

  • Those indexes on EffectiveDate are encouraging SQL Server to implement a triangular join, more about that in a minute. If you've added the EffectiveDate index specifically for this comparison between the two tables, then you're in luck - drop it and create something which works.  This would be ideal:

    CREATE INDEX ix_Stuff ON ParentTable (PostalCode, LastName) INCLUDE (FirstName, BirthDate, StreetAddress, City, EffectiveDate)

    However it's quite a big index. You could miss out the INCLUDE section and let key lookups take care of the other columns, because on a table of names and addresses of this modest size, you won't have many rows sharing postalcode and surname.

    Depending on the queries interrogating ParentTable, you might find that switching the position of the two key columns works for this use case and supports user queries too.

     

    Edit: Here's a link to an article explaining Triangular Joins, written by Jeff Moden.

    • This reply was modified 3 years, 10 months ago by  ChrisM@Work.
    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply