Why does this take so long!

  • Hi Everybody

    This is my first posting in this forum, so please bear with me if I should get some thing wrong! I will eventually get there!

    I have this query which does make use of some derived tables. My understanding is that this would result in a very efficient query but apparently things work a trifle differently. Could someone please advise why this query takes inordinately long time when the base data set in table "evd_NonNBV_ForChecking_BAU" contains any more that 20,000 records. This one contains a little over 46000 records.

    This is the query (partial) :-

    Selectcount(*) FRMP_RecordCount, b.RECORD_ID FRMP_Record_Id, b.METER_ID FRMP_Meter_Id, Convert(Varchar(10), a.FromDate, 103) FRMP_FromDate

    ,Convert(Varchar(10), a.ToDate, 103) FRMP_ToDate, Convert(Varchar(10), b.Billing_Start_Date, 103) FRMP_Billing_Start_Date, Convert(Varchar(10)

    ,b.Billing_End_Date, 103) FRMP_Billing_End_Date, NetAmount FRMP_NetAmount, a.Party FRMP_Party, b.PROVIDER FRMP_Provider

    From(Selectd.RECORD_ID, d.METER_ID, d.BILLING_START_DATE, d.BILLING_END_DATE, d.NET_AMOUNT As NetAmount, d.PROVIDER

    From(SelectMETER_ID, BILLING_START_DATE, BILLING_END_DATE, PROVIDER

    FromNETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU

    Group ByMETER_ID, BILLING_START_DATE, BILLING_END_DATE, Provider

    Having Sum(net_AMOUNT) <> 0) c

    Inner Join

    NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU d

    On

    c.METER_ID = d.METER_ID And c.BILLING_START_DATE = d.BILLING_START_DATE And c.BILLING_END_DATE = d.BILLING_END_DATE) b

    --From#MyTemp_1 b

    Left Join

    NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a

    On

    a.NMI = b.METER_ID and (((b.BILLING_START_DATE between a.FromDate and a.ToDate) and (b.billing_end_date between a.FromDate and a.ToDate))

    and a.Role = 'FRMP')

    --Wherea.NMI Is Null

    Group Byb.RECORD_ID, b.METER_ID, Convert(Varchar(10), a.FromDate, 103), Convert(Varchar(10), a.ToDate, 103), Convert(Varchar(10), b.Billing_Start_Date, 103)

    ,Convert(Varchar(10), a.ToDate, 103), Convert(Varchar(10), b.Billing_End_Date, 103), NetAmount, a.Party, b.PROVIDER

    The question is :-

    When this query is run with the derived table (as defined by table definition 'b' - shown in italics), the query takes a very long time to run - its been 16 minutes and its still going - but when this derived table is temporarily stored as a temp table or even a physical table, the query takes about 1 second. This is when the 'where clause' has been commented out. With the where clause included it takes even longer.

    Any suggestions why is this happening. One more thing though (if this helps to know) is that when this derived table 'b' is inner joined with table 'a', it run very quickly, no problems there. This problem only shows up with left join. The query does complete but take about an hour when it should normally not take any more than a few seconds.

    I have done this kind of left joins with large tables (with about 70 million records) and it hasn't taken this long ever. So what is happening with this one!

    Any help with this one would be greatly appreciated.

    Best regards

    Deepak

  • Not enough information. Could be missing indexes, could be incorrect stats, could be a few other things

    Please post table definitions, index definitions and execution plan, see http://www.sqlservercentral.com/articles/SQLServerCentral/66909/ for details

    Cleaned up query:

    SELECT COUNT(*) FRMP_RecordCount ,

    b.RECORD_ID FRMP_Record_Id ,

    b.METER_ID FRMP_Meter_Id ,

    CONVERT(VARCHAR(10), a.FromDate, 103) FRMP_FromDate ,

    CONVERT(VARCHAR(10), a.ToDate, 103) FRMP_ToDate ,

    CONVERT(VARCHAR(10), b.Billing_Start_Date, 103) FRMP_Billing_Start_Date ,

    CONVERT(VARCHAR(10), b.Billing_End_Date, 103) FRMP_Billing_End_Date ,

    NetAmount FRMP_NetAmount ,

    a.Party FRMP_Party ,

    b.PROVIDER FRMP_Provider

    FROM ( SELECT d.RECORD_ID ,

    d.METER_ID ,

    d.BILLING_START_DATE ,

    d.BILLING_END_DATE ,

    d.NET_AMOUNT AS NetAmount ,

    d.PROVIDER

    FROM ( SELECT METER_ID ,

    BILLING_START_DATE ,

    BILLING_END_DATE ,

    PROVIDER

    FROM NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU

    GROUP BY METER_ID ,

    BILLING_START_DATE ,

    BILLING_END_DATE ,

    Provider

    HAVING SUM(net_AMOUNT) <> 0

    ) c

    INNER JOIN NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU d ON c.METER_ID = d.METER_ID

    AND c.BILLING_START_DATE = d.BILLING_START_DATE

    AND c.BILLING_END_DATE = d.BILLING_END_DATE

    ) b --From #MyTemp_1 b

    LEFT JOIN NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a ON a.NMI = b.METER_ID

    AND ( ( ( b.BILLING_START_DATE BETWEEN a.FromDate AND a.ToDate )

    AND ( b.billing_end_date BETWEEN a.FromDate AND a.ToDate )

    )

    AND a.Role = 'FRMP'

    )

    --Where a.NMI Is Null

    GROUP BY b.RECORD_ID ,

    b.METER_ID ,

    CONVERT(VARCHAR(10), a.FromDate, 103) ,

    CONVERT(VARCHAR(10), a.ToDate, 103) ,

    CONVERT(VARCHAR(10), b.Billing_Start_Date, 103) ,

    CONVERT(VARCHAR(10), a.ToDate, 103) ,

    CONVERT(VARCHAR(10), b.Billing_End_Date, 103) ,

    NetAmount ,

    a.Party ,

    b.PROVIDER

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail

    Thanks for your response.

    I don't know how would the table definition help but here they are :-

    evd_NonNBV_ForChecking_BAU

    ([METER_ID] [varchar](10) NOT NULL,

    [BILLING_START_DATE] [smalldatetime] NOT NULL,

    [BILLING_END_DATE] [smalldatetime] NOT NULL,

    [RECORD_ID] [int] IDENTITY(1,1) NOT NULL,

    [NET_AMOUNT] [decimal](15, 2) NOT NULL,

    [PROVIDER] [varchar](10) NOT NULL

    )

    MSATS_DATA_BAU_FINAL

    ([SequenceNumber] [numeric](18, 0) NULL,

    [CreationDate] [datetime] NULL,

    [MaintenanceDate] [datetime] NULL,

    [RowStatus] [varchar](5) NULL,

    [FromDate] [datetime] NULL,

    [ToDate] [datetime] NULL,

    [NMI] [char](10) NULL,

    [Party] [varchar](30) NULL,

    [Role] [varchar](20) NULL,

    [CATS_FILE_ID] [int] NULL

    )

    There are no indexes on any of the tables

    The execution plans for both options (one with temp table and the other with derived table) are attached herewith for your perusal. Also, attached is the spreadsheet showing the statistics and result output for the option with temp table. Will send the same when the query finishes execution with the derived tables.

    Hope this would you in determining what is causing this inordinate delay in the query execution with the derived tables.

    P.S The attachment has now (at 1.00 pm AEST) been updated to include the results and statistics for the option with derived tables as well.

    This option took 75 minutes against less than a minute with temp table!!!!!!!!!!!!!!!!!!!!!

    Best regards

    Deepak

  • deepakagarwalathome (10/1/2012)


    ...There are no indexes on any of the tables...

    This might have something to do with it. The first sentence of an excellent indexing article[/url] reads "Indexes are fundamental to database design".

    Your query can be made more efficient by changing the structure slightly:

    SELECT

    COUNT(*) FRMP_RecordCount ,

    b.RECORD_ID FRMP_Record_Id ,

    b.METER_ID FRMP_Meter_Id ,

    CAST(a.FromDate AS DATE) FRMP_FromDate ,

    CAST(a.ToDate AS DATE) FRMP_ToDate ,

    CAST(b.Billing_Start_Date AS DATE) FRMP_Billing_Start_Date ,

    CAST(b.Billing_End_Date AS DATE) FRMP_Billing_End_Date ,

    NetAmount FRMP_NetAmount ,

    a.Party FRMP_Party ,

    b.PROVIDER FRMP_Provider

    FROM (

    SELECT

    RECORD_ID ,

    METER_ID ,

    BILLING_START_DATE ,

    BILLING_END_DATE ,

    NET_AMOUNT AS NetAmount ,

    PROVIDER,

    SUMnet_AMOUNT = SUM(net_AMOUNT) OVER(PARTITION BY METER_ID, BILLING_START_DATE, BILLING_END_DATE, Provider)

    FROM NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU

    ) b

    LEFT JOIN NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a

    ON a.NMI = b.METER_ID

    AND b.BILLING_START_DATE BETWEEN a.FromDate AND a.ToDate

    AND b.billing_end_date BETWEEN a.FromDate AND a.ToDate

    AND a.Role = 'FRMP'

    WHERE b.SUMnet_AMOUNT <> 0

    GROUP BY

    b.RECORD_ID ,

    b.METER_ID ,

    CAST(a.FromDate AS DATE) FRMP_FromDate ,

    CAST(a.ToDate AS DATE) FRMP_ToDate ,

    CAST(b.Billing_Start_Date AS DATE) FRMP_Billing_Start_Date ,

    CAST(b.Billing_End_Date AS DATE) FRMP_Billing_End_Date ,

    b.NetAmount ,

    a.Party ,

    b.PROVIDER

    But it's no substitute for correct indexing. The tables are only half the story.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi ChrisM@Work

    Thanks a zillion for your response. That works fine. No problems.

    A couple of questions though :-

    First, whilst both 'Over Partiion By...' and the derived table sub-query do the same thing, is the former more efficient in its execution by an order of magnitude or it happened to be so in this one instance?

    Second, I am attaching the full query that I now have (I haven't replaced the similar code in the first half of the query as the inner join is very quick - it was only the left join that was the problem - which has now been addressed - thanks to you) and I find that it work fine in about 9 seconds if the last 'Where...' clause after the table 'z' definition is not included. However, if you include this last 'Where...' clause' it takes a hell of a lot longer (10+ minutes). I can certainly populate a table with the dataset without the last 'Where...' clause and then extract those records where the condition of the last 'Where...' clause' is implemented. But I wanted to know is this happening. Any suggestions.

    Further, out of interest, did the execution plans and the statistics that I had included give you any more indication as to why one option took 75+ minutes whilst the other took less than 10 seconds????????????

    P.S. I have updated this post with the output and statistics for the two options - took 105 minutes with the last where clause included.

    Best regards

    Deepak

  • deepakagarwalathome (10/2/2012)


    Hi ChrisM@Work

    Thanks a zillion for your response. That works fine. No problems.

    A couple of questions though :-

    First, whilst both 'Over Partiion By...' and the derived table sub-query do the same thing, is the former more efficient in its execution by an order of magnitude or it happened to be so in this one instance?

    It’s more efficient because it only requires one read of the table. Aggregating then joining back requires two.

    Second, I am attaching the full query that I now have (I haven't replaced the similar code in the first half of the query as the inner join is very quick - it was only the left join that was the problem - which has now been addressed - thanks to you) and I find that it work fine in about 9 seconds if the last 'Where...' clause after the table 'z' definition is not included. However, if you include this last 'Where...' clause' it takes a hell of a lot longer (10+ minutes). I can certainly populate a table with the dataset without the last 'Where...' clause and then extract those records where the condition of the last 'Where...' clause' is implemented. But I wanted to know is this happening. Any suggestions.

    Lack of indexes and index stats – the optimizer can only make a poor guess with little to work on.

    I’ve made a couple of amendments to your full query but there’s a way to go yet. There’s a lot of almost-duplicated reads of tables.

    Further, out of interest, did the execution plans and the statistics that I had included give you any more indication as to why one option took 75+ minutes whilst the other took less than 10 seconds????????????

    Not really. The execution plans are estimated – the actual plans are far more useful. If you can post them as attachments, that would be grand.

    P.S. I have updated this post with the output and statistics for the two options - took 105 minutes with the last where clause included.

    Best regards

    Deepak

    Here’s what I’ve got so far – test and see if it matches your existing output. Get the indexing done if you can.

    /*

    recommended indexes

    NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU

    METER_ID, RECORD_ID, BILLING_START_DATE, billing_end_date, provider

    NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL

    NMI, FromDate, ToDate, Role, Party

    */

    -- SumNetAmount1 and SumNetAmount2 generate the same result set

    ; WITH SumNetAmount1 AS (

    Selectd.RECORD_ID, d.METER_ID, d.BILLING_START_DATE, d.BILLING_END_DATE, d.NET_AMOUNT As NetAmount, d.PROVIDER

    From (

    Select METER_ID, BILLING_START_DATE, BILLING_END_DATE, PROVIDER

    From NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU

    Group By METER_ID, BILLING_START_DATE, BILLING_END_DATE, Provider

    Having Sum(net_AMOUNT) <> 0

    ) c

    Inner Join NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU d

    on c.METER_ID = d.METER_ID

    And c.BILLING_START_DATE = d.BILLING_START_DATE

    And c.BILLING_END_DATE = d.BILLING_END_DATE

    ),

    SumNetAmount2 AS (

    Select RECORD_ID, METER_ID, BILLING_START_DATE, BILLING_END_DATE, NET_AMOUNT AS NetAmount, PROVIDER,

    SUMNET_AMOUNT = SUM(net_AMOUNT) OVER(PARTITION BY METER_ID, BILLING_START_DATE, BILLING_END_DATE, Provider)

    From NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU

    )

    selectz.FRMP_RecordCount, z.FRMP_Record_Id, z.FRMP_Meter_Id, z.FRMP_FromDate, z.FRMP_ToDate, z.FRMP_Billing_Start_Date, z.FRMP_Billing_End_Date

    ,z.FRMP_NetAmount, z.FRMP_Party, z.FRMP_Provider

    ,z.MDP_RecordCount, z.MDP_Record_Id, z.MDP_Meter_Id, z.MDP_FromDate, z.MDP_ToDate, z.MDP_Billing_Start_Date, z.MDP_Billing_End_Date

    ,z.MDP_NetAmount, z.MDP_Party, z.MDP_Provider

    ,z.StatusComment

    From(SelectFRMP_RecordCount, FRMP_Record_Id, FRMP_Meter_Id, FRMP_FromDate, FRMP_ToDate, FRMP_Billing_Start_Date, FRMP_Billing_End_Date

    ,FRMP_NetAmount, FRMP_Party, FRMP_Provider

    ,MDP_RecordCount, MDP_Record_Id, MDP_Meter_Id, MDP_FromDate, MDP_ToDate, MDP_Billing_Start_Date, MDP_Billing_End_Date

    ,MDP_NetAmount, MDP_Party, MDP_Provider

    ,CaseWhen z1.FRMP_Party Is Not Null ThenCaseWhen z2.MDP_Party Is Not Null Then'VALID - TRU FRMP, CORRECT MDP' Else 'INVALID - TRU FRMP, INCORRECT MDP' End

    Else 'INVALID - TRU NOT FRMP'

    End 'StatusComment'

    From ( -- z1

    SelectCount(*) FRMP_RecordCount, b.RECORD_ID FRMP_Record_Id, b.METER_ID FRMP_Meter_Id, Convert(Varchar(10), a.FromDate, 103) FRMP_FromDate

    ,Convert(Varchar(10), a.ToDate, 103) FRMP_ToDate, Convert(Varchar(10), b.Billing_Start_Date, 103) FRMP_Billing_Start_Date,

    Convert(Varchar(10), b.Billing_End_Date, 103) FRMP_Billing_End_Date

    ,NetAmount FRMP_NetAmount, a.Party FRMP_Party, b.PROVIDER FRMP_Provider

    From SumNetAmount1 b -- CTE

    Inner Join NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a

    On a.NMI = b.METER_ID

    and (((b.BILLING_START_DATE between a.FromDate and a.ToDate)

    and (b.billing_end_date between a.FromDate and a.ToDate))

    and a.Role = 'FRMP')

    Group By

    b.RECORD_ID,

    b.METER_ID,

    Convert(Varchar(10), a.FromDate, 103),

    Convert(Varchar(10), a.ToDate, 103),

    Convert(Varchar(10), b.Billing_Start_Date, 103),

    Convert(Varchar(10), b.Billing_End_Date, 103),

    NetAmount, a.Party, b.PROVIDER

    ) z1

    Full Outer Join

    ( -- z2

    Selectcount(*) MDP_RecordCount, b.RECORD_ID MDP_Record_Id, b.METER_ID MDP_Meter_Id, Convert(Varchar(10), a.FromDate, 103) MDP_FromDate

    ,Convert(Varchar(10), a.ToDate, 103) MDP_ToDate, Convert(Varchar(10), b.Billing_Start_Date, 103) MDP_Billing_Start_Date, Convert(Varchar(10), b.Billing_End_Date, 103) MDP_Billing_End_Date

    ,NetAmount MDP_NetAmount, a.Party MDP_Party, b.PROVIDER MDP_Provider

    From SumNetAmount1 b -- CTE

    Inner Join NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a

    On a.NMI = b.METER_ID

    and (((b.BILLING_START_DATE between a.FromDate and a.ToDate)

    and (b.billing_end_date between a.FromDate and a.ToDate))

    and a.Role = 'MDP')

    Where1 =CaseWhen Upper(b.PROVIDER) = 'SPA' and Left(a.party, 3) In ('GLO', 'SPA') Then 1

    When Upper(b.PROVIDER) <> 'SPA' and Upper(b.PROVIDER) = Left(a.party, 3) Then 1

    End

    Group By

    b.RECORD_ID,

    b.METER_ID,

    Convert(Varchar(10), a.FromDate, 103),

    Convert(Varchar(10), a.ToDate, 103),

    Convert(Varchar(10), b.Billing_Start_Date, 103),

    Convert(Varchar(10), b.Billing_End_Date, 103),

    NetAmount, a.Party, b.PROVIDER

    ) z2

    On z1.FRMP_Record_Id = z2.MDP_Record_Id

    Union All

    SelectFRMP_RecordCount, FRMP_Record_Id, FRMP_Meter_Id, FRMP_FromDate, FRMP_ToDate, FRMP_Billing_Start_Date, FRMP_Billing_End_Date

    ,FRMP_NetAmount, FRMP_Party, FRMP_Provider

    ,MDP_RecordCount, MDP_Record_Id, MDP_Meter_Id, MDP_FromDate, MDP_ToDate, MDP_Billing_Start_Date, MDP_Billing_End_Date

    ,MDP_NetAmount, MDP_Party, MDP_Provider

    ,CaseWhenz1.FRMP_Party Is Not Null ThenCaseWhenz2.MDP_Party Is Not Null Then'VALID - TRU FRMP, CORRECT MDP'

    Else'INVALID - TRU FRMP, INCORRECT MDP'

    End

    Else'INVALID - TRU NOT FRMP'

    End 'StatusComment'

    From( -- z1

    Selectcount(*) FRMP_RecordCount, b.RECORD_ID FRMP_Record_Id, b.METER_ID FRMP_Meter_Id, Convert(Varchar(10), a.FromDate, 103) FRMP_FromDate

    ,Convert(Varchar(10), a.ToDate, 103) FRMP_ToDate, Convert(Varchar(10), b.Billing_Start_Date, 103) FRMP_Billing_Start_Date, Convert(Varchar(10), b.Billing_End_Date, 103) FRMP_Billing_End_Date

    ,NetAmount FRMP_NetAmount, a.Party FRMP_Party, b.PROVIDER FRMP_Provider

    From SumNetAmount2 b --- ## CTE

    LEFT JOIN NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a

    On a.NMI = b.METER_ID

    and b.BILLING_START_DATE BETWEEN a.FromDate AND a.ToDate

    and b.billing_end_date BETWEEN a.FromDate AND a.ToDate

    and a.Role = 'FRMP'

    Where b.SUMnet_AMOUNT <> 0

    and a.NMI Is Null

    Group By

    b.RECORD_ID,

    b.METER_ID,

    Convert(Varchar(10), a.FromDate, 103),

    Convert(Varchar(10), a.ToDate, 103),

    Convert(Varchar(10), b.Billing_Start_Date, 103),

    Convert(Varchar(10), a.ToDate, 103),

    Convert(Varchar(10), b.Billing_End_Date, 103),

    NetAmount, a.Party, b.PROVIDER

    ) z1

    Full Outer Join ( -- z2

    Selectcount(*) MDP_RecordCount, b.RECORD_ID MDP_Record_Id, b.METER_ID MDP_Meter_Id, Convert(Varchar(10), a.FromDate, 103) MDP_FromDate, Convert(Varchar(10), a.ToDate, 103) MDP_ToDate

    ,Convert(Varchar(10), b.Billing_Start_Date, 103) MDP_Billing_Start_Date, Convert(Varchar(10), b.Billing_End_Date, 103) MDP_Billing_End_Date

    ,NetAmount MDP_NetAmount, a.Party MDP_Party, b.PROVIDER MDP_Provider

    From SumNetAmount2 b --- ## CTE

    LEFT JOIN NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a

    On a.NMI = b.METER_ID

    and b.BILLING_START_DATE BETWEEN a.FromDate AND a.ToDate

    and b.billing_end_date BETWEEN a.FromDate AND a.ToDate

    and a.Role = 'MDP'

    Where b.SUMnet_AMOUNT <> 0

    and a.NMI Is Null

    and 1 = Case

    When Upper(b.PROVIDER) = 'SPA' and Left(a.party, 3) In ('GLO', 'SPA') Then 1

    When Upper(b.PROVIDER) <> 'SPA' and Upper(b.PROVIDER) = Left(a.party, 3) Then 1

    End

    Group By

    b.RECORD_ID,

    b.METER_ID,

    Convert(Varchar(10), a.FromDate, 103),

    Convert(Varchar(10), a.ToDate, 103),

    Convert(Varchar(10), b.Billing_Start_Date, 103),

    Convert(Varchar(10), a.ToDate, 103),

    Convert(Varchar(10), b.Billing_End_Date, 103),

    NetAmount, a.Party, b.PROVIDER

    ) z2

    On z1.FRMP_Record_Id = z2.MDP_Record_Id

    ) z

    Wherez.FRMP_Record_Id Is Not Null

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Chris

    Thanks for that!I fully understand what you are trying to do with the CTE's. I have run the query with the CTE's.

    The problem, however, remains the same, that is, it still executes the option without the last 'Where...' clause in around 10 seconds but with the last 'Where...' clause included, it takes aound the same 75 minutes.

    I have attached the 'Actual Execution' plans for the two options for your perusal - I don't understand how to read them. These are for the queries as was sent to you yesterday (without the CTE definition - but as I said earlier that didn't make a hell of a lot of difference in the execution times at all).

    Looking forward to hearing from you about what you find therein!

    Best regards

    Deepak

  • deepakagarwalathome (10/3/2012)


    Hi Chris

    Thanks for that!I fully understand what you are trying to do with the CTE's. I have run the query with the CTE's.

    The problem, however, remains the same, that is, it still executes the option without the last 'Where...' clause in around 10 seconds but with the last 'Where...' clause included, it takes aound the same 75 minutes.

    I have attached the 'Actual Execution' plans for the two options for your perusal - I don't understand how to read them. These are for the queries as was sent to you yesterday (without the CTE definition - but as I said earlier that didn't make a hell of a lot of difference in the execution times at all).

    Looking forward to hearing from you about what you find therein!

    Best regards

    Deepak

    It takes 75 minutes because the optimiser is choosing a poor plan based on the information it has. It would be interesting though to see the execution plans with and without the WHERE clause. I'm afraid the attachment is of no value to me. What you should do is save an actual plan as a .sqlplan file and attach it to the post. Do this for the actual plans with and without the WHERE clause.

    I've had another play with your query. There are four queries within it which read data, something like

    q1 full outer join q2

    union all

    q3 full outer join q4

    I've reconstructed the common elements of q1 and q2 as CTE1, and q3 and q4 as CTE2. The resulting query looks like this:

  • Reconstructed query with CTEs
  • ; WITH

    CTE1 AS ( -- used twice by part above UNION ALL

    SELECT

    b.RECORD_ID RECORD_ID,

    b.METER_ID Meter_Id,

    Convert(Varchar(10), a.FromDate, 103) FromDate,

    Convert(Varchar(10), a.ToDate, 103) ToDate,

    Convert(Varchar(10), b.Billing_Start_Date, 103) Billing_Start_Date,

    Convert(Varchar(10), b.Billing_End_Date, 103) Billing_End_Date,

    b.NetAmount NetAmount,

    SUMNET_AMOUNT = SUM(b.net_AMOUNT) OVER(PARTITION BY METER_ID, BILLING_START_DATE, BILLING_END_DATE, Provider),

    a.Party Party,

    b.PROVIDER Provider,

    a.[Role] [Role]

    FROM NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU b

    INNER JOIN NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a

    ON a.NMI = b.METER_ID

    AND b.BILLING_START_DATE between a.FromDate and a.ToDate

    AND b.billing_end_date between a.FromDate and a.ToDate

    ),

    CTE2 AS ( -- used twice by part below UNION ALL

    SELECT

    b.RECORD_ID RECORD_ID,

    b.METER_ID Meter_Id,

    Convert(Varchar(10), a.FromDate, 103) FromDate,

    Convert(Varchar(10), a.ToDate, 103) ToDate,

    Convert(Varchar(10), b.Billing_Start_Date, 103) Billing_Start_Date,

    Convert(Varchar(10), b.Billing_End_Date, 103) Billing_End_Date,

    b.NetAmount NetAmount,

    SUMNET_AMOUNT = SUM(b.net_AMOUNT) OVER(PARTITION BY METER_ID, BILLING_START_DATE, BILLING_END_DATE, Provider),

    a.Party Party,

    b.PROVIDER Provider,

    a.[Role] [Role]

    FROM NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU b

    LEFT JOIN NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a

    On a.NMI = b.METER_ID

    AND b.BILLING_START_DATE BETWEEN a.FromDate AND a.ToDate

    AND b.billing_end_date BETWEEN a.FromDate AND a.ToDate

    WHERE a.NMI IS NULL

    )

    selectz.FRMP_RecordCount, z.FRMP_Record_Id, z.FRMP_Meter_Id, z.FRMP_FromDate, z.FRMP_ToDate, z.FRMP_Billing_Start_Date, z.FRMP_Billing_End_Date

    ,z.FRMP_NetAmount, z.FRMP_Party, z.FRMP_Provider

    ,z.MDP_RecordCount, z.MDP_Record_Id, z.MDP_Meter_Id, z.MDP_FromDate, z.MDP_ToDate, z.MDP_Billing_Start_Date, z.MDP_Billing_End_Date

    ,z.MDP_NetAmount, z.MDP_Party, z.MDP_Provider

    ,z.StatusComment

    From(

    Select

    FRMP_RecordCount, FRMP_Record_Id, FRMP_Meter_Id, FRMP_FromDate, FRMP_ToDate, FRMP_Billing_Start_Date, FRMP_Billing_End_Date,

    FRMP_NetAmount, FRMP_Party, FRMP_Provider,

    MDP_RecordCount, MDP_Record_Id, MDP_Meter_Id, MDP_FromDate, MDP_ToDate, MDP_Billing_Start_Date, MDP_Billing_End_Date,

    MDP_NetAmount, MDP_Party, MDP_Provider,

    CASE

    WHEN a1.FRMP_Party IS NOT NULL AND a2.MDP_Party IS NOT NULL THEN 'VALID - TRU FRMP, CORRECT MDP'

    WHEN a1.FRMP_Party IS NOT NULL AND a2.MDP_Party IS NULL THEN 'INVALID - TRU FRMP, INCORRECT MDP'

    ELSE 'INVALID - TRU NOT FRMP'

    END [StatusComment]

    FROM ( -- q1

    SELECT

    Count(*) FRMP_RecordCount,

    RECORD_ID FRMP_Record_Id,

    Meter_Id FRMP_Meter_Id,

    FromDate FRMP_FromDate,

    ToDate FRMP_ToDate,

    Billing_Start_Date FRMP_Billing_Start_Date,

    Billing_End_Date FRMP_Billing_End_Date,

    NetAmount FRMP_NetAmount,

    Party FRMP_Party,

    Provider FRMP_Provider

    FROM CTE1

    WHERE [Role] = 'FRMP'

    AND SUMNET_AMOUNT <> 0

    GROUP BY RECORD_ID, Meter_Id, FromDate, ToDate,

    Billing_Start_Date, Billing_End_Date, NetAmount, Party, Provider

    ) q1

    FULL OUTER JOIN ( -- q2

    SELECT

    Count(*) MDP_RecordCount,

    RECORD_ID MDP_Record_Id,

    Meter_Id MDP_Meter_Id,

    FromDate MDP_FromDate,

    ToDate MDP_ToDate,

    Billing_Start_Date MDP_Billing_Start_Date,

    Billing_End_Date MDP_Billing_End_Date,

    NetAmount MDP_NetAmount,

    Party MDP_Party,

    Provider MDP_Provider

    FROM CTE1

    WHERE [Role] = 'MDP'

    AND SUMNET_AMOUNT <> 0

    AND ( (Upper(PROVIDER) = Left(party, 3)) OR (Upper(PROVIDER) = 'SPA' and Left(party, 3) = 'GLO') )

    GROUP BY Record_Id, Meter_Id, FromDate, ToDate,

    Billing_Start_Date, Billing_End_Date, NetAmount, Party, Provider

    ) q2 ON q1.FRMP_Record_Id = q2.MDP_Record_Id

    UNION ALL

    SelectFRMP_RecordCount, FRMP_Record_Id, FRMP_Meter_Id, FRMP_FromDate, FRMP_ToDate, FRMP_Billing_Start_Date, FRMP_Billing_End_Date

    ,FRMP_NetAmount, FRMP_Party, FRMP_Provider

    ,MDP_RecordCount, MDP_Record_Id, MDP_Meter_Id, MDP_FromDate, MDP_ToDate, MDP_Billing_Start_Date, MDP_Billing_End_Date

    ,MDP_NetAmount, MDP_Party, MDP_Provider,

    CASE

    WHEN z1.FRMP_Party Is Not Null AND z2.MDP_Party Is Not Null Then 'VALID - TRU FRMP, CORRECT MDP'

    WHEN z1.FRMP_Party Is Not Null AND z2.MDP_Party Is Null Then 'INVALID - TRU FRMP, INCORRECT MDP'

    ELSE 'INVALID - TRU NOT FRMP'

    END [StatusComment]

    From ( -- q3

    SELECT

    count(*) FRMP_RecordCount,

    RECORD_ID FRMP_Record_Id,

    METER_ID FRMP_Meter_Id,

    Convert(Varchar(10), FromDate, 103) FRMP_FromDate,

    Convert(Varchar(10), ToDate, 103) FRMP_ToDate,

    Convert(Varchar(10), Billing_Start_Date, 103) FRMP_Billing_Start_Date,

    Convert(Varchar(10), Billing_End_Date, 103) FRMP_Billing_End_Date,

    NetAmount FRMP_NetAmount, Party FRMP_Party, PROVIDER FRMP_Provider

    FROM CTE2

    WHERE [Role] = 'FRMP'

    AND SUMNET_AMOUNT <> 0

    GROUP BY Record_Id, Meter_Id, FromDate, ToDate,

    Billing_Start_Date, Billing_End_Date, NetAmount, Party, Provider

    ) q3

    FULL OUTER JOIN ( -- q4

    SELECT

    count(*) MDP_RecordCount,

    RECORD_ID MDP_Record_Id,

    METER_ID MDP_Meter_Id,

    Convert(Varchar(10), FromDate, 103) MDP_FromDate,

    Convert(Varchar(10), ToDate, 103) MDP_ToDate,

    Convert(Varchar(10), Billing_Start_Date, 103) MDP_Billing_Start_Date,

    Convert(Varchar(10), Billing_End_Date, 103) MDP_Billing_End_Date,

    NetAmount MDP_NetAmount, Party MDP_Party, PROVIDER MDP_Provider

    FROM CTE2

    WHERE [Role] = 'MDP'

    AND SUMnet_AMOUNT <> 0

    AND ( (Upper(PROVIDER) = Left(party, 3)) OR (Upper(PROVIDER) = 'SPA' and Left(party, 3) = 'GLO') )

    GROUP BY Record_Id, Meter_Id, FromDate, ToDate,

    Billing_Start_Date, Billing_End_Date, NetAmount, Party, Provider

    ) q4 ON q3.FRMP_Record_Id = q4.MDP_Record_Id

    ) z

    Wherez.FRMP_Record_Id Is Not Null

    You should test that this works and amend it if it doesn't generate the expected results.

    Having reconstructed the query to use those two CTE's, you can now see the wood through the trees. The query actually looks like this;

    -- get all matching rows between table a and table b

    SELECT columns

    FROM NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU b

    INNER JOIN NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a ON a.matchingcolumn = b.matchingcolumn

    UNION ALL

    -- get all rows from table b where there's no match in table a

    SELECT columns

    FROM NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU b

    LEFT JOIN NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a ON a.matchingcolumn = b.matchingcolumn

    WHERE a.matchingcolumn IS NULL

    Now this is logically equivalent to a left join; get all rows from the left hand table whether or not they match on the right (get all rows from the lhs where they match, plus all rows from the lhs where they don't). This simplifies your overall query to the following:

  • Replace UNION with LEFT JOIN
  • ; WITH

    CTE1 AS ( -- used twice

    SELECT

    b.RECORD_ID RECORD_ID,

    b.METER_ID Meter_Id,

    CAST(a.FromDate AS DATE) FromDate,

    CAST(a.ToDate AS DATE) ToDate,

    CAST(b.Billing_Start_Date AS DATE) Billing_Start_Date,

    CAST(b.Billing_End_Date AS DATE) Billing_End_Date,

    b.NetAmount NetAmount,

    SUMNET_AMOUNT = SUM(b.net_AMOUNT) OVER(PARTITION BY METER_ID, BILLING_START_DATE, BILLING_END_DATE, Provider),

    a.Party Party,

    b.PROVIDER Provider,

    a.[Role] [Role]

    FROM NETWORK_VALIDATION..evd_NonNBV_ForChecking_BAU b

    LEFT JOIN NETWORK_VALIDATION..MSATS_DATA_BAU_FINAL a

    On a.NMI = b.METER_ID

    AND b.BILLING_START_DATE BETWEEN a.FromDate AND a.ToDate

    AND b.billing_end_date BETWEEN a.FromDate AND a.ToDate

    )

    SELECT

    FRMP_RecordCount, FRMP_Record_Id, FRMP_Meter_Id, FRMP_FromDate, FRMP_ToDate, FRMP_Billing_Start_Date, FRMP_Billing_End_Date,

    FRMP_NetAmount, FRMP_Party, FRMP_Provider,

    MDP_RecordCount, MDP_Record_Id, MDP_Meter_Id, MDP_FromDate, MDP_ToDate, MDP_Billing_Start_Date, MDP_Billing_End_Date,

    MDP_NetAmount, MDP_Party, MDP_Provider,

    CASE

    WHEN a1.FRMP_Party IS NOT NULL AND a2.MDP_Party IS NOT NULL THEN 'VALID - TRU FRMP, CORRECT MDP'

    WHEN a1.FRMP_Party IS NOT NULL AND a2.MDP_Party IS NULL THEN 'INVALID - TRU FRMP, INCORRECT MDP'

    ELSE 'INVALID - TRU NOT FRMP'

    END [StatusComment]

    FROM ( -- q1

    SELECT

    Count(*) FRMP_RecordCount,

    RECORD_ID FRMP_Record_Id,

    Meter_Id FRMP_Meter_Id,

    FromDate FRMP_FromDate,

    ToDate FRMP_ToDate,

    Billing_Start_Date FRMP_Billing_Start_Date,

    Billing_End_Date FRMP_Billing_End_Date,

    NetAmount FRMP_NetAmount,

    Party FRMP_Party,

    Provider FRMP_Provider

    FROM CTE1

    WHERE [Role] = 'FRMP'

    AND SUMNET_AMOUNT <> 0

    AND RECORD_ID IS NOT NULL

    GROUP BY RECORD_ID, Meter_Id, FromDate, ToDate,

    Billing_Start_Date, Billing_End_Date, NetAmount, Party, Provider

    ) q1

    LEFT JOIN ( -- q2

    SELECT

    Count(*) MDP_RecordCount,

    RECORD_ID MDP_Record_Id,

    Meter_Id MDP_Meter_Id,

    FromDate MDP_FromDate,

    ToDate MDP_ToDate,

    Billing_Start_Date MDP_Billing_Start_Date,

    Billing_End_Date MDP_Billing_End_Date,

    NetAmount MDP_NetAmount,

    Party MDP_Party,

    Provider MDP_Provider

    FROM CTE1

    WHERE [Role] = 'MDP'

    AND SUMNET_AMOUNT <> 0

    AND ( (Upper(PROVIDER) = Left(party, 3)) OR (Upper(PROVIDER) = 'SPA' and Left(party, 3) = 'GLO') )

    GROUP BY Record_Id, Meter_Id, FromDate, ToDate,

    Billing_Start_Date, Billing_End_Date, NetAmount, Party, Provider

    ) q2 ON q1.FRMP_Record_Id = q2.MDP_Record_Id

    Remember where you had a full outer join between the two derived tables – followed by that pesky WHERE clause (Where z.FRMP_Record_Id Is Not Null)? The WHERE clause turns the full outer join into a left join. I think that’s all it does, but just in case I’ve included it in q1.

    Removing a full outer join from a query against tables with no indexes is likely to significantly improve performance – give it a shot and let us know how you get on. When you’re done (preferably before), you should concentrate on those indexes. Ask if you are unsure how to go about deciding which columns to use.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Chris

    Sorry for this inordinate delay. I couldn't upload the attachments on Friday and I didn't have access to this data at home, so didn't make the post then.

    Here it is now though.

    In the attachment, I have included the query that I ran, the result sets, the statistics data and the actual execution plans for both the options as requested by you, the first without the last where clause and the second with it. This time around, it didn't take anywhere near as long as it did earlier although I did replace the derived table 'b' with your '...partition by... clause for both the queries of the overall union query. May be that was what was bringing it down with the where clause.

    As I have said earlier, I dont't know how to read the execution plan as yet (I am going to be spending time soon to be able to understand at least the very basics of it so that I can write slightly more efficient queries this point forward), but I am sure it would tell you a lot and you would be able to intrepret the same and advise as to where it was falling down.

    Shall look forward to hearing from you soon.

    Once again, sorry for not providing you with the relevant details earlier.

    Best regards

    Deepak

    I will test with your revision later tonight and advise.

    Deepak

  • Viewing 9 posts - 1 through 8 (of 8 total)

    You must be logged in to reply to this topic. Login to reply