SQL Query execution slowness

  • Hi ,

    Below is the query which has two tables, based on the business requirement , have written the query as below. but it is taking 30 mins to complete the query. There is no Insert or Update or Delete happen when the query is executed. This query is executed from SSIS package.Is there a better way to write the query ? or is the below query is fine.

    When in the final query when C.columnid is removed then i see it run in 3 mins, but where as it is needed based on business requirement.

    Below are the number if records in each table, TableA has  200000 and TableB has 200000000.

    Select Count(*) from [TableA] --> 200000
    Select Count(*) from [TableB]--> 200000000

    ;WITH CTETableA AS
    (
    SELECT DISTINCT
    [ColumnID]
    FROM TableA
    WHERE [ColumnID] IS NOT NULL
    AND [Date] >= <Day before yesterday date>
    AND [Date] < <Yesterday date>
    )
    ,CTETableB AS
    (
    SELECT DISTINCT
    [ColumnID] --- Int
    ,[Column_S] -- Varchar(20)
    FROM TableB
    WHERE [Date] >= <Day before yesterday date>
    AND [Date] < <Yesterday date>
    )

    SELECT
    b.[ColumnID] --Int
    ,s.[Column_var]-- Varchar(25)
    ,s.[Column_int]-- Int
    ,s.[Date] --- Datetime
    ,b.[Column_Var]-- Varchar(20)
    ,b.[ColumnTime] --- Time
    ,b.[Column_S] -- Varchar(20)
    ,b.[Column_float] -- Float
    FROM TableB b
    LEFT JOIN CTETableB c ON c.[ColumnID] = b.[ColumnID]
    LEFT JOIN TableA d ON d.[ColumnID] = b.[ColumnID] AND d.[Column_S] = b.[Column_S]
    LEFT JOIN CTETableA s ON s.[ColumnID] = b.[ColumnID]
    WHERE ((b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>) OR c.[ColumnID] IS NOT NULL OR d.[ColumnID] IS NOT NULL)
  • Giving sample sql like this may lead to errors - and for this you may need to supply us with a full explain plan (both slow and fast).

    we will also need to know which indexes these tables have

    but I may be missing something but it seems that, at least the way you posted, CTETableB is not required at all and should be removed.

    regardless - this seems like a typical case of "if master has changed or child has changed" extract data - that being the case doing a union of 2 queries would normally give better results - first query detects changes to master, second detects changes to child.

  • CTETableB is needed as we are having the not null condition [OR c.[ColumnID] IS NOT NULL OR d.[ColumnID] IS NOT NULL] in the query.

    TableA [ColumnID] is clustered index and Date with non-cluster index, non- unique.

    TableB [ColumnID] is clustered index and Date with non-cluster index , non- unique.

    Execution plan-  Don't have permission to execute.

    How could Union would help us here , in which way we could represent it to achieve?. Again thanks for your time and help.

  • according to the code you posted it does not seem you need that cte - and without your full code we don't know if you missed something on your post or if it accurately represents the full query joins and criteria.

    ,CTETableB AS
    (
    SELECT DISTINCT
    [ColumnID] --- Int
    ,[Column_S] -- Varchar(20)
    FROM TableB
    WHERE [Date] >= <Day before yesterday date>
    AND [Date] < <Yesterday date>
    )

    select *
    from TableB b
    LEFT JOIN CTETableB c ON c.[ColumnID] = b.[ColumnID]
    where (b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>) OR c.[ColumnID] IS NOT NULL


    in both CTETableB and the main select you are filtering on the same field and criteria so having the cte with a left outer really equates to

    select *
    from TableB b
    where (b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>)

    how ever having the cte as is may even be giving you duplicate records if there are multilple records on TableB with same ColumnID so I would check that

    Regarding the union

    -- first select records where TableB date field meets the criteria
    SELECT
    b.[ColumnID] --Int
    ,s.[Column_var]-- Varchar(25)
    ,s.[Column_int]-- Int
    ,s.[Date] --- Datetime
    ,b.[Column_Var]-- Varchar(20)
    ,b.[ColumnTime] --- Time
    ,b.[Column_S] -- Varchar(20)
    ,b.[Column_float] -- Float
    FROM TableB b
    LEFT JOIN TableA s ON s.[ColumnID] = b.[ColumnID] and (s.[Date] >= <Day before yesterday date> AND s.[Date] < <Yesterday date>) -- I wonder if this was correct in the first place
    WHERE (b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>)

    union -- remove duplicate entries - depending on the data this may need to be done slightly differently using rownumber

    -- second retrieve records from TableB where there is a match on TableA where date meets the criteria and Column_S has a match
    SELECT
    b.[ColumnID] --Int
    ,s.[Column_var]-- Varchar(25)
    ,s.[Column_int]-- Int
    ,s.[Date] --- Datetime
    ,b.[Column_Var]-- Varchar(20)
    ,b.[ColumnTime] --- Time
    ,b.[Column_S] -- Varchar(20)
    ,b.[Column_float] -- Float
    FROM TableB b
    inner JOIN TableA s ON s.[ColumnID] = b.[ColumnID] AND d.[Column_S] = b.[Column_S]

    and (s.[Date] >= <Day before yesterday date> AND s.[Date] < <Yesterday date>) -- TableA filtering


    An index on table A with Date, column_S and ColumnID required - potentially with include of column_var and columnn_int

    An index on table B with Date, Column_S, ColumnID required - potentially with remaining columns required to cover the select

    as we don't know your data neither your true table and query the above is just a possible approach - may be incorrect but only you will be able to determine that.
  • Once again thanks for your time.

    Yes you are correct on the observation on

    "how ever having the cte as is may even be giving you duplicate records if there are multiple records on TableB with same ColumnID so I would check that"

    you could see in my code that i have added c.[ColumnID] IS NOT NULL or d.[ColumnID] IS NOT NULL ,which will fetch the record for the whole day when left join is there

    Here the query is based on datewith timestamp as well, say if it is day before yesterday(22-06-2020 2:00:05:056) till yesterday(23-06-202 3:34:34:978) , so in the first cte we would be getting set of records say 100 , in second cte we are getting some 1000 records for the dates mentioned, while doing the left join we would be getting the additional rows

    cte1

    columnid, date

    1, '22-06-2020 2:00:05:056'

    2, '22-06-2020 5:00:08:232'

    1, '22-06-2020 1:12:05:344'

    cte2

    columnid, date

    1, '22-06-2020 7:13:12:234'

    2, '22-06-2020 4:11:21:213'

    3, '22-06-2020 3:12:08:784'

    so while doing this we would be ending with 3 records with left join with [ColumnID] IS NOT NULL and not with 2 records, columnid 1 will be coming 2 time and columnid 2 will be coming 1 time, so total 3 records.

    ;With CTETableB AS
    (
    SELECT DISTINCT
    [ColumnID] --- Int
    ,[Column_S] -- Varchar(20)
    FROM TableB
    WHERE [Date] >= <Day before yesterday date>
    AND [Date] < <Yesterday date>
    )

    select *
    from TableB b
    LEFT JOIN CTETableB c ON c.[ColumnID] = b.[ColumnID]
    where (b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>) OR c.[ColumnID] IS NOT NULL


    in both CTETableB and the main select you are filtering on the same field and criteria so having the cte with a left outer really equates to

    select *
    from TableB b
    where (b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>)

    how ever having the cte as is may even be giving you duplicate records if there are multilple records on TableB with same ColumnID so I would check that
  • Any suggestions on the above, still the issues persists. the time consumed here is more. Below is the reads and writes and time taken for the query to complete.

    Run time Reads Writes

    20min 3,596,149 97

    45min 73,412,456 378,279

    48min 198,270,103 1,558,015

    22min 202,130,607 1,303,290

    34min 143,565,802 1,123,014

    25min 15,137,118 2,233

  • The query is a mess.

    its difficult to understand the requirements from the code because it lacks of logic, not to mention some constructions in the query eliminate others.

    My guess would be you want actually full outer join of 2sets fot the specific date.

    you definitely don't need scan each table twice for that.

    Just include all required columns into CTE's and then join these CTE's and only them:

    Hate this framework.
    15 mins reworking the code from a mobile device - and it drops everything at once
    So, try following my suggestions yourself, post your code - we'll adjust it if something wrong

    _____________
    Code for TallyGenerator

  • 2 more things:

    - forget the word "DISTINCT". Use Group by when needed (I bet you don't need it in your query);

    - if you have an index rebuilding routine, use that window of opportunity to rebuild the current wrong indexing into correct one:

    ColumnID -PK, NON-clustered

    [Date] - clustered

    this willl bring the response time of your query down to several seconds.

    _____________
    Code for TallyGenerator

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply