Better way to write this query

  • I have inherited this query that is comparing data from one table to another. It is using a combination of columns to "build" a unique column that gets inserted into the other table should it not exist. The query is below INSERT INTO ReportRecipients

    SELECT rs.[ReportID]

    ,rs.[ATID]

    ,rs.[AID]

    ,rs.[RecipientId]

    ,rs.[Source]

    ,rs.[Frequency]

    ,rs.[DeliveryMethod]

    ,'NEW RECORD'

    ,CONVERT(VARCHAR(10),rs.[ATID])

    + CONVERT(VARCHAR(10),rs.[AID])

    + CONVERT(VARCHAR(10),rs.[ReportID])

    + CONVERT(VARCHAR(10),rs.[Frequency])

    + CONVERT(VARCHAR(10),rs.[RecipientId])

    + CONVERT(VARCHAR(10),rs.[DeliveryMethod])

    ,rs.[scheduleid]

    ,rs.FeeAmount

    ,rs.FeeTranCode

    ,0

    FROM [dbo].[ReportSchedules] rs

    WHERE CONVERT(VARCHAR(10),rs.[ATID])

    + CONVERT(VARCHAR(10),[AID])

    + CONVERT(VARCHAR(10),rs.[ReportID])

    + CONVERT(VARCHAR(10),rs.[Frequency])

    + CONVERT(VARCHAR(10),rs.[RecipientId])

    + CONVERT(VARCHAR(10),rs.[DeliveryMethod])

    NOT IN (SELECT DISTINCT [id] from dbo.ReportRecipients)I realize the WHERE clause poses an issue due to the non-sargable concatenated columns. The problem with this query is that in both tables, there are millions of rows. In it's current state it takes about 2 minutes to run, generating 4 million reads.

    The "CONVERT(VARCHAR(10),[ATID]) + CONVERT(VARCHAR(10),[AID]) + CONVERT(VARCHAR(10),[ReportID]) + CONVERT(VARCHAR(10),[Frequency]) + CONVERT(VARCHAR(10),a.[RecipientId]) + CONVERT(VARCHAR(10),a.[DeliveryMethod])" business creates the "ID" column in the other table...which when new rows are added into the main table, it uses the combination of the CONVERTS to match against the "ID" column to determine if it's a new record or not.

    Here's some sample data and DDL:

    /* Create Main source table */

    CREATE TABLE [dbo].[ReportSchedules](

    [ScheduleId] [int] NOT NULL,

    [ReportID] [int] NULL,

    [ATID] [int] NULL,

    [AID] [int] NULL,

    [RecipientId] [int] NULL,

    [Source] [varchar](5) NULL,

    [Frequency] [tinyint] NULL,

    [DeliveryMethod] [tinyint] NULL,

    [FaxTime] [varchar](50) NULL,

    [FeeAmount] [money] NULL,

    [FeeTranCode] [char](5) NULL,

    CONSTRAINT [csPk_ReportSchedules] PRIMARY KEY CLUSTERED

    ( [ScheduleId] ASC ) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    /* Populate Main source table with some data */

    INSERT INTO ReportSchedules

    SELECT 1,34,51,7301,7481,'CI',2,1,'Not Specified',NULL,NULL UNION ALL

    SELECT 2,34,51,7271,7451,'CI',2,1,'Not Specified',NULL,NULL UNION ALL

    SELECT 3,24,51,7302,7482,'CI',2,1,'Not Specified',NULL,NULL UNION ALL

    SELECT 4,24,51,7273,7453,'CI',2,1,'Not Specified',NULL,NULL UNION ALL

    SELECT 5,24,51,9905,10395,'CI',2,1,'Not Specified',NULL,NULL

    /* Create table to compare against source table */

    CREATE TABLE [dbo].[ReportRecipients](

    [ScheduleID] [bigint] IDENTITY(1,1) NOT NULL,

    [ReportID] [int] NOT NULL,

    [ATID] [int] NOT NULL,

    [AID] [int] NOT NULL,

    [RecipientId] [int] NOT NULL,

    [Source] [varchar](5) NOT NULL,

    [Frequency] [tinyint] NOT NULL,

    [DeliveryMethod] [tinyint] NOT NULL,

    [FaxTime] [varchar](40) NULL,

    [ID] [varchar](60) NOT NULL,

    [CCScheduleID] [varchar](10) NULL,

    [FeeAmount] [money] NULL

    ) ON [PRIMARY]

    ALTER TABLE [dbo].[ReportRecipients] ADD [FeeTranCode] [char](6) NULL

    ALTER TABLE [dbo].[ReportRecipients] ADD [FeeFired] [bit] NULL

    PRIMARY KEY CLUSTERED

    (

    [ScheduleID] ASC

    ) ON [PRIMARY]

    GO

    /* Insert sample data */

    INSERT INTO ReportRecipients

    SELECT 34,51,7301,7481,'CI',2,1,'Not Specified',51730134274811,6229,NULL,NULL,0 UNION ALL

    SELECT 34,51,7271,7451,'CI',2,1,'Not Specified',51727134274511,6231,NULL,NULL,0 UNION ALL

    SELECT 24,51,7302,7482,'CI',2,1,'Not Specified',51730224274821,6234,NULL,NULL,0 UNION ALL

    SELECT 24,51,7273,7453,'CI',2,1,'Not Specified',51727324274531,6235,NULL,NULL,0

    This will set up the example so there is a new record to insert into the ReportRecipients table when the first query (the first SQL above) is run

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • WITH cte_insert_report_recipients

    AS

    (

    select a.*,CONVERT(VARCHAR(10),a.[ATID])

    + CONVERT(VARCHAR(10),[AID])

    + CONVERT(VARCHAR(10),a.[ReportID])

    + CONVERT(VARCHAR(10),a.[Frequency])

    + CONVERT(VARCHAR(10),a.[RecipientId])

    + CONVERT(VARCHAR(10),a.[DeliveryMethod]) [ID] from ReportSchedules a

    )

    insert into ReportRecipients

    select

    b.ReportID,

    b.ATID,

    b.AID,

    b.RecipientId,

    b.Source,

    b.Frequency,

    b.DeliveryMethod,

    'NEW RECORD',

    CONVERT(VARCHAR(10),b.[ATID]) + CONVERT(VARCHAR(10),b.[AID])

    + CONVERT(VARCHAR(10),b.[ReportID])

    + CONVERT(VARCHAR(10),b.[Frequency])

    + CONVERT(VARCHAR(10),b.[RecipientId])

    + CONVERT(VARCHAR(10),b.[DeliveryMethod]),

    b.ScheduleId,b.FeeAmount,b.FeeTranCode,0

    from cte_insert_report_recipients b

    where b.ID not in (SELECT [id] from dbo.ReportRecipients)

    This should help.

  • WITH cte_insert_report_recipients

    AS

    (

    select a.*,CONVERT(VARCHAR(10),a.[ATID])

    + CONVERT(VARCHAR(10),[AID])

    + CONVERT(VARCHAR(10),a.[ReportID])

    + CONVERT(VARCHAR(10),a.[Frequency])

    + CONVERT(VARCHAR(10),a.[RecipientId])

    + CONVERT(VARCHAR(10),a.[DeliveryMethod]) [ID] from ReportSchedules a

    )

    insert into ReportRecipients

    select

    b.ReportID,

    b.ATID,

    b.AID,

    b.RecipientId,

    b.Source,

    b.Frequency,

    b.DeliveryMethod,

    'NEW RECORD',

    CONVERT(VARCHAR(10),b.[ATID]) + CONVERT(VARCHAR(10),b.[AID])

    + CONVERT(VARCHAR(10),b.[ReportID])

    + CONVERT(VARCHAR(10),b.[Frequency])

    + CONVERT(VARCHAR(10),b.[RecipientId])

    + CONVERT(VARCHAR(10),b.[DeliveryMethod]),

    b.ScheduleId,b.FeeAmount,b.FeeTranCode,0

    from cte_insert_report_recipients b

    left outer join dbo.ReportRecipients c

    on b.ID = c.ID

    where c.ID is null

    This should perform even better.

  • 2 minutes doesn't sound too bad considering it's performing so many scans on your ReportRecipients table!

    The only alternative I can think of (and I'm not even convinced it will speed things up) is to create an index on the ID column in your target table, create a temp table for your source which includes the composite key, index it then perform the join.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • 2 possible solutions:

    1. Computed Column in dbo.ReportSchedules with an index that you the use in your comparison. With the sample data it doesn't get used but with a larger set of data it might.

    2. Similar to above create an indexed view that has the expression as a column with an index. I didn't try this.

  • ksatpute123 (9/23/2013)


    WITH cte_insert_report_recipients

    AS

    (

    select a.*,CONVERT(VARCHAR(10),a.[ATID])

    + CONVERT(VARCHAR(10),[AID])

    + CONVERT(VARCHAR(10),a.[ReportID])

    + CONVERT(VARCHAR(10),a.[Frequency])

    + CONVERT(VARCHAR(10),a.[RecipientId])

    + CONVERT(VARCHAR(10),a.[DeliveryMethod]) [ID] from ReportSchedules a

    )

    insert into ReportRecipients

    select

    b.ReportID,

    b.ATID,

    b.AID,

    b.RecipientId,

    b.Source,

    b.Frequency,

    b.DeliveryMethod,

    'NEW RECORD',

    CONVERT(VARCHAR(10),b.[ATID]) + CONVERT(VARCHAR(10),b.[AID])

    + CONVERT(VARCHAR(10),b.[ReportID])

    + CONVERT(VARCHAR(10),b.[Frequency])

    + CONVERT(VARCHAR(10),b.[RecipientId])

    + CONVERT(VARCHAR(10),b.[DeliveryMethod]),

    b.ScheduleId,b.FeeAmount,b.FeeTranCode,0

    from cte_insert_report_recipients b

    left outer join dbo.ReportRecipients c

    on b.ID = c.ID

    where c.ID is null

    This should perform even better.

    I design my code much like this and has performed very well for me, especially in high volume, high traffic processing.

    When seeing a sub-select in a where clause it just makes the hair on the back of my neck stand up.... :w00t:

    Kurt

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • I tested the result for both the queries. The result was with not much difference taking the same time. This was due to the volume of data. At high volumes the joins perform better in most cases so, I will choose that. Of course in this case there is need for indexing as mentioned in previous comments.

  • How about one of these 2 options? At least they don't do the CONVERTs twice.

    INSERT INTO ReportRecipients

    SELECT rs.[ReportID]

    ,rs.[ATID]

    ,rs.[AID]

    ,rs.[RecipientId]

    ,rs.[Source]

    ,rs.[Frequency]

    ,rs.[DeliveryMethod]

    ,'NEW RECORD'

    ,newkey

    ,rs.[scheduleid]

    ,rs.FeeAmount

    ,rs.FeeTranCode

    ,0

    FROM [dbo].[ReportSchedules] rs

    CROSS APPLY (

    SELECT CONVERT(VARCHAR(10),rs.[ATID])

    + CONVERT(VARCHAR(10),rs.[AID])

    + CONVERT(VARCHAR(10),rs.[ReportID])

    + CONVERT(VARCHAR(10),rs.[Frequency])

    + CONVERT(VARCHAR(10),rs.[RecipientId])

    + CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)

    WHERE a.newkey NOT IN (SELECT [id] from dbo.ReportRecipients);

    INSERT INTO ReportRecipients

    SELECT rs.[ReportID]

    ,rs.[ATID]

    ,rs.[AID]

    ,rs.[RecipientId]

    ,rs.[Source]

    ,rs.[Frequency]

    ,rs.[DeliveryMethod]

    ,'NEW RECORD'

    ,newkey

    ,rs.[scheduleid]

    ,rs.FeeAmount

    ,rs.FeeTranCode

    ,0

    FROM [dbo].[ReportSchedules] rs

    CROSS APPLY (

    SELECT CONVERT(VARCHAR(10),rs.[ATID])

    + CONVERT(VARCHAR(10),rs.[AID])

    + CONVERT(VARCHAR(10),rs.[ReportID])

    + CONVERT(VARCHAR(10),rs.[Frequency])

    + CONVERT(VARCHAR(10),rs.[RecipientId])

    + CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)

    LEFT JOIN dbo.ReportRecipients c ON c.[id] = a.newkey

    WHERE c.[id] IS NULL;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks for all the GREAT responses everyone. I had actually tweaked an existing index to get the query down to a few seconds, and at one point attempted the CROSS APPLY but wasn't getting the results that I was expecting (but I am no TSQL guru) - I have been absolutely overwhelmed with some production issues since late last night but will get around to trying out some of these suggestions on the "large" tables to see which produces the best result.

    Much appreciate it, I'll post back when I can determine which method works best!

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • dwain.c (9/23/2013)


    How about one of these 2 options? At least they don't do the CONVERTs twice.

    INSERT INTO ReportRecipients

    SELECT rs.[ReportID]

    ,rs.[ATID]

    ,rs.[AID]

    ,rs.[RecipientId]

    ,rs.[Source]

    ,rs.[Frequency]

    ,rs.[DeliveryMethod]

    ,'NEW RECORD'

    ,newkey

    ,rs.[scheduleid]

    ,rs.FeeAmount

    ,rs.FeeTranCode

    ,0

    FROM [dbo].[ReportSchedules] rs

    CROSS APPLY (

    SELECT CONVERT(VARCHAR(10),rs.[ATID])

    + CONVERT(VARCHAR(10),rs.[AID])

    + CONVERT(VARCHAR(10),rs.[ReportID])

    + CONVERT(VARCHAR(10),rs.[Frequency])

    + CONVERT(VARCHAR(10),rs.[RecipientId])

    + CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)

    WHERE a.newkey NOT IN (SELECT [id] from dbo.ReportRecipients);

    INSERT INTO ReportRecipients

    SELECT rs.[ReportID]

    ,rs.[ATID]

    ,rs.[AID]

    ,rs.[RecipientId]

    ,rs.[Source]

    ,rs.[Frequency]

    ,rs.[DeliveryMethod]

    ,'NEW RECORD'

    ,newkey

    ,rs.[scheduleid]

    ,rs.FeeAmount

    ,rs.FeeTranCode

    ,0

    FROM [dbo].[ReportSchedules] rs

    CROSS APPLY (

    SELECT CONVERT(VARCHAR(10),rs.[ATID])

    + CONVERT(VARCHAR(10),rs.[AID])

    + CONVERT(VARCHAR(10),rs.[ReportID])

    + CONVERT(VARCHAR(10),rs.[Frequency])

    + CONVERT(VARCHAR(10),rs.[RecipientId])

    + CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)

    LEFT JOIN dbo.ReportRecipients c ON c.[id] = a.newkey

    WHERE c.[id] IS NULL;

    Nice. +1

  • I like door #2 better with the Left Join.... 😉

    Kurt

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply