Fine-tune SQL for better performance

  • I have been asked to fine-tune the following SQL that was written by a former employee. The DBA says that using the select tables in inner join subquery is causing heavy resource usage as it selects all the rows from the tables. He wants to know whether I could use the fields at the beginning of the select statement and remove those inner join subqueries. I am not sure whether this is possible and would like to ask the experts here for their comments. Any help is greatly appreciated.

    I am including some extract where the DBA pointed out the issues.

    Select emp.vid,pt,dn,dcode,dcat,dtime,dlocation from

    (select id, pdate, duration

    from demo where mneu = 'COST' and pdate >= '01/01/2024' and pdate<='01/31/2024'

    ) T1

    Inner Join

    (select vid,elemid, resp, qval from sal where resp <>0

    ) T2

    on T2.elemid = T1.id

    Inner Join

    (Select apptid, name from tappt

    ) T3

    on T3.apptid = T1.id

    ) ACTY

    Inner Join

    (

    Select vid,pid,acc,stat from RM where loc not like 'ABC%'

    )REM

    on REM.vid = ACTY.vid

    Inner Join

    ( Select pit, name from HMA

    ) HM

    On HM.pid = REM.pid

    Inner Join

    (

    Select pid, prm from HMR where desc = 'RET'

    ) HRM

    On HRM.pid = REM.pid

    Inner Join

    (

    Select mdid,emid,qryval from MGE where desc = 'POP'

    )LC

    on LC.mdid = ACTY.dlocation

  • No idea about performance, and don't know where some of those columns are coming from, but it looks like it could be simplified to something like this, which is at least much simpler to read.

     

    Select emp.vid,pt,dn,dcode,dcat,dtime,dlocation 
    from demo T1
    inner join sal T2 on T2.elemid = T1.id
    inner join tappt T3 on T3.apptid = T1.id
    inner join RM REM on REM.vid = emp.vid
    inner join HMA HM on HM.pid = REM.pid
    inner join HMR HRM on HRM.pid = REM.pid
    inner join MGE LC on LC.mdid = ACTY.dlocation
    where T1.mneu = 'COST' and T1.pdate >= '01/01/2024' and T1.pdate<='01/31/2024'
    and T2.resp <> 0
    and REM.loc not like 'ABC%'
    and HRM.desc = 'RET'
    and LC.desc = 'POP'
  • Since all of the joins to the derived tables (selects in the from/join) - there is no benefit to even using the sub-queries.  Removing the sub-queries and refactoring to simple join statements - the code becomes much easier to read and manage.

    As for the statement from your DBA saying these sub-queries (derived tables) are reading all of the data from the tables - that isn't caused by having the derived tables.  That will be due to statistics on each table and the expected number of rows that SQL Server estimates will be returned.  If that estimate is larger than about 20% of the table - SQL Server will perform a clustered index scan.

    If the columns in the where clause are not indexed - then SQL Server must read every row in that table to determine if that row meets the requirements.

    I would look to the execution plan (actual if possible) to see what indexes are being used - and to see if the estimated number of rows matches (or is very close) to actual number of rows.  If not, then you probably have an issue with out of date statistics - if that is not the case then maybe you can identify one or more indexes that can be added to improve the performance.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I see "emp." being referenced, but I don't see "emp" anywhere in the supplied query.

    You would have to check each of the joined tables to see if there is an index that matches the query against the table.  If not, you could create a covering index if/as needed that would prevent a full scan of the table.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • "Select emp.vid,pt,dn,dcode,dcat,dtime,dlocation "

    IF you are selecting all the columns from just your base table, and non from the other join tables,  you could try removing the joins and instead employ  the WHERE Exists operation. But as others have said Indexes and statistics (via the query plan) are the place to start.

     

    ----------------------------------------------------

  • Thanks a lot for all the replies. Much appreciated. I am going to remove the sub queries and see how it works. Will also ask the DBA for the execution plan for the original and the modified query and check the indexes too. Again really appreciate all the replies.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply