How to simplify this query?

  • Friends,

    Here is a quite complicated SQL written for DB2.

    The aim of this query is to select a lot of infor about the products BUT only list at the end the rows which have been modified after a given date.

    My problem is that it runs for ever. The biggest tables (SB_PRODUCTREPORT, sb_product, product) have approximaty 30K rows which is not ver much. All other contains only severa hundreds or just a dozen of rows. The problem is that this version of DB2 doesnot support the temporary tables.

    Normally this query should run not more then a minute.

    Any idea?

     

    SELECT

    pd.pd_pdref,

    af.af_pct,

    pd.pd_author,

    pd.pd_barcode,

    bc.bc_name,

    pd.pd_can_cons,

    ca.ca_name,

    CASE WHEN (pd.pd_dsref = 4 AND pd.pd_stock_qty = 0) THEN 'Utánrendelheto' ELSE st.ds_name END as ds_name,

    pd.pd_edition_year,

    pd.pd_has_image,

    pd.pd_hun_title,

    pd.pd_lang,

    pd.pd_nov_exp,

    pd.pd_nov_msg,

    pd.pd_numpages,

    pd.pd_orig_title,

    ad.ad_name,

    py.py_name,

    pd.pd_serial_name,

    pd.pd_serial_num,

    pd.pd_spec_exp,

    pd.pd_spec_msg,

    pd.pd_stock_qty,

    pd.pd_vtref,

    pr.prheight,

    pr.prlngth,

    pr.prwght,

    pr.prwidth,

    pr.prldesc1,

    pr.prnbr,

    REP.PP_RETAILPRICE,

    REP.DI_PCT

    FROM DB2NC.SB_PRODUCTREPORT REP JOIN db2nc.sb_product pd ON REP.pd_pdref = pd.pd_pdref

    JOIN db2nc.sb_afakod af ON pd.pd_afref=af.af_afref

    LEFT JOIN db2nc.sb_binder_code bc ON pd.pd_bcref=bc.bc_bcref

    JOIN db2nc.sb_prod_cat ca ON pd.pd_caref=ca.ca_caref

    JOIN db2nc.sb_prod_stat st ON pd.pd_dsref=st.ds_dsref

    JOIN db2nc.sb_partner pn ON pd.pd_puref=pn.pn_pnref

    JOIN db2nc.sb_address ad ON pn.pn_sarfnbr_hq=ad.ad_adref

    JOIN db2nc.sb_prod_type py ON pd.pd_pyref=py.py_pyref

    JOIN db2nc.product pr ON pd.pd_pdref=pr.prrfnbr

    WHERE REP.BS_BSREF = 318515

    AND (EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES

    WHERE MT_TABLE = 'PD' AND MT_CODE = PD.PD_PDREF AND MT_DATE >= '2003-11-29 00:00:00.000')

    OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES

    WHERE MT_TABLE = 'AF' AND MT_CODE = AF_AFREF AND MT_DATE >= '2003-11-29 00:00:00.000')

    OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES

    WHERE MT_TABLE = 'BC' AND MT_CODE = BC_BCREF AND MT_DATE >= '2003-11-29 00:00:00.000')

    OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES

    WHERE MT_TABLE = 'CA' AND MT_CODE = CA_CAREF AND MT_DATE >= '2003-11-29 00:00:00.000')

    OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES

    WHERE MT_TABLE = 'ST' AND MT_CODE = DS_DSREF AND MT_DATE >= '2003-11-29 00:00:00.000')

    OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES

    WHERE MT_TABLE = 'PN' AND MT_CODE = PN_PNREF AND MT_DATE >= '2003-11-29 00:00:00.000')

    OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES

    WHERE MT_TABLE = 'AD' AND MT_CODE = AD_ADREF AND MT_DATE >= '2003-11-29 00:00:00.000')

    OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES

    WHERE MT_TABLE = 'PY' AND MT_CODE = PY_PYREF AND MT_DATE >= '2003-11-29 00:00:00.000')

    OR EXISTS(SELECT MT_CODE FROM DB2NC.SB_MODIFIED_TABLES

    WHERE MT_TABLE = 'PR' AND MT_CODE = PRRFNBR AND MT_DATE >= '2003-11-29 00:00:00.000')

    OR EXISTS(SELECT REP1.PD_PDREF FROM DB2NC.SB_PRODUCTREPORT REP1

    WHERE REP1.BS_BSREF = REP.BS_BSREF AND REP1.PD_PDREF = REP.PD_PDREF

    AND (REP1.PP_STARTDATE >= '2003-11-29' OR REP1.di_startdate >= ('2003-11-29')))



    Bye
    Gabor

  • This looks like an indexing problem.

    Can you execute this query with set showplan_text on and paste the results here?  This will allow us to look at the query plan for you.

    Also it looks like your long list of exists statements could be reduced to a single statement.  Something like

    Select distinct g.MT_TABLE, g.lastdate, m.MT_CODE

    from (

    SELECT MT_TABLE, max(MT_DATE) as 'lastdate'

    FROM DB2NC.SB_MODIFIED_TABLES

    GROUP BY MT_TABLE

    ) g join DB2NC.SB_MODIFIED_TABLES m on g.MT_TABLE = m.MT_TABLE and g.lastdate = m.MT_DATE

    where m.MT_DATE >= '2003-11-29 00:00:00.000'

    This would give you the tables and codes last updated after that date in a smaller statement, but whether it was quicker would depend on the plan

     

     

     




    Keith Henry



    According to everyone I know I "do something with computers?" for a living, so there you go.

  • Keith,

     

    Thanks for reply.

     

    Some infos:

    All indexes are fine and they are used in the query. With the exception of one table SB_PRODUCTREPORT where a table scan is used which is normal, because I'm going trough every rows. This is the base of the report and I'm joining all the other tables to it.

     

    The SB_MODIFIED_TABLES is a special table which has 3 columns, MT_CODE, MT_TABLE, MT_DATE.

    The purpose of this table is to store through the triggers on insert/update the modified rows of the parent tables here.

    As an example take the table db2nc.sb_prod_cat which contains the product category. When I change now a product category #10, I will insert into the SB_MODIFIED_TABLES the followin row:

    insert into SB_MODIFIED_TABLES(MT_CODE, MT_TABLE, MT_DATE) (values 'CA', 10, getdate())

     

    The aim of my complicated query is to select ONLY the modified rows and not the whole product table.

    That's why I think that you need all those exists subqueries. Because I have to select every rows where the product category, the price, the statistics... has been changed.

     

     



    Bye
    Gabor

  • I'm not certain I understand your problem correctly from your post but it looks to me like what you really want is all the records from the SB_PRODUCTREPORT and SB_PRODUCT tables and only those from the rest of the tables where the joins are correct and the data is in the modified table. If that is the case try using something like this...

     

    SELECT PD.pd_pdref,

        AF.af_pct,

        PD.pd_author,

        PD.pd_barcode,

        BC.bc_name,

        PD.pd_can_cons,

        CA.ca_name,

        CASE WHEN (PD.pd_dsref = 4 AND PD.pd_stock_qty = 0) THEN 'Utánrendelheto'

            ELSE st.ds_name

            END as ds_name,

        PD.pd_edition_year,

        PD.pd_has_image,

        PD.pd_hun_title,

        PD.pd_lang,

        PD.pd_nov_exp,

        PD.pd_nov_msg,

        PD.pd_numpages,

        PD.pd_orig_title,

        AD.ad_name,

        PY.py_name,

        PD.pd_serial_name,

        PD.pd_serial_num,

        PD.pd_spec_exp,

        PD.pd_spec_msg,

        PD.pd_stock_qty,

        PD.pd_vtref,

        PR.prheight,

        PR.prlngth,

        PR.prwght,

        PR.prwidth,

        PR.prldesc1,

        PR.prnbr,

        REP.PP_RETAILPRICE,

        REP.DI_PCT

    FROM SB_PRODUCTREPORT REP

        JOIN SB_MODIFIED_TABLES MT ON MT.MT_CODE = PD.PD_PDREF

        JOIN sb_product PD ON REP.pd_pdref = PD.pd_pdref

            LEFT JOIN sb_afakod AF ON PD.pd_afref = AF.af_afref AND MT.MT_TABLE = 'AF'

            LEFT JOIN sb_binder_code BC ON PD.pd_bcref = BC.bc_bcref AND MT.MT_TABLE = 'BC'

            LEFT JOIN sb_prod_cat CA ON PD.pd_caref = CA.ca_caref AND MT.MT_TABLE = 'CA'

            LEFT JOIN sb_prod_stat ST ON PD.pd_dsref = ST.ds_dsref AND MT.MT_TABLE = 'ST'

            LEFT JOIN sb_partner PN ON PD.pd_puref = PN.pn_pnref AND MT.MT_TABLE = 'PN'

            LEFT JOIN sb_address AD ON PN.pn_sarfnbr_hq = AD.ad_adref AND MT.MT_TABLE = 'AD'

            LEFT JOIN sb_prod_type PY ON PD.pd_pyref = PY.py_pyref AND MT.MT_TABLE = 'PY'

            LEFT JOIN product PR ON PD.pd_pdref = PR.prrfnbr AND MT.MT_TABLE = 'PR'

    WHERE REP.BS_BSREF = 318515

         AND MT_DATE >= '2003-11-29 00:00:00.000'

    Please note I did remove the "db2nc." and it should be added back if needed.




    Gary Johnson
    Microsoft Natural Language Group
    DBA, Sr. DB Engineer

    This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.

  • With reference to gljjr reply:

    Will it make much of a difference if "REP.BS_BSREF = 318515 AND MT_DATE >= '2003-11-29 00:00:00.000'" is moved into the relevant ON part.

    Lastly I have the habit of casting string dates to DateTime.

    AND MT_DATE >= Cast('2003-11-29 00:00:00.000' as DateTime)

     

  • Without having data to test with I can only shoot a bit here but this should work better simple because it replaces the exists which are you primary reason for bogdown.


    SELECT

    pd.pd_pdref,

    af.af_pct,

    pd.pd_author,

    pd.pd_barcode,

    bc.bc_name,

    pd.pd_can_cons,

    ca.ca_name,

    (CASE WHEN (pd.pd_dsref = 4 AND pd.pd_stock_qty = 0) THEN 'Utánrendelheto' ELSE st.ds_name END) as ds_name,

    pd.pd_edition_year,

    pd.pd_has_image,

    pd.pd_hun_title,

    pd.pd_lang,

    pd.pd_nov_exp,

    pd.pd_nov_msg,

    pd.pd_numpages,

    pd.pd_orig_title,

    ad.ad_name,

    py.py_name,

    pd.pd_serial_name,

    pd.pd_serial_num,

    pd.pd_spec_exp,

    pd.pd_spec_msg,

    pd.pd_stock_qty,

    pd.pd_vtref,

    pr.prheight,

    pr.prlngth,

    pr.prwght,

    pr.prwidth,

    pr.prldesc1,

    pr.prnbr,

    REP.PP_RETAILPRICE,

    REP.DI_PCT

    FROM

    DB2NC.SB_PRODUCTREPORT REP

    INNER JOIN

    db2nc.sb_product pd

    ON

    REP.pd_pdref = pd.pd_pdref

    INNER JOIN

    db2nc.sb_afakod af

    ON

    pd.pd_afref = af.af_afref

    LEFT JOIN

    db2nc.sb_binder_code bc

    ON

    pd.pd_bcref = bc.bc_bcref

    INNER JOIN

    db2nc.sb_prod_cat ca

    ON

    pd.pd_caref = ca.ca_caref

    INNER JOIN

    db2nc.sb_prod_stat st

    ON

    pd.pd_dsref = st.ds_dsref

    INNER JOIN

    db2nc.sb_partner pn

    ON

    pd.pd_puref = pn.pn_pnref

    INNER JOIN

    db2nc.sb_address ad

    ON

    pn.pn_sarfnbr_hq = ad.ad_adref

    INNER JOIN

    db2nc.sb_prod_type py

    ON

    pd.pd_pyref = py.py_pyref

    INNER JOIN

    db2nc.product pr

    ON

    pd.pd_pdref = pr.prrfnbr

    LEFT JOIN

    DB2NC.SB_MODIFIED_TABLES MT

    ON

    MT.MT_CODE = (CASE MT.MT_TABLE

    WHEN 'PD' THEN PD.PD_PDREF

    WHEN 'AF' THEN AF.AF_AFREF

    WHEN 'BC' THEN BC.BC_BCREF

    WHEN 'CA' THEN CA.CA_CAREF

    WHEN 'ST' THEN DS.DS_DSREF

    WHEN 'PN' THEN PN.PN_PNREF

    WHEN 'AD' THEN AD.AD_ADREF

    WHEN 'PY' THEN PY.PY_PYREF

    WHEN 'PR' THEN PR.PRRFNBR)

    END)

    AND MT.MT_DATE >= '20031129'

    LEFT JOIN

    DB2NC.SB_PRODUCTREPORT REP1

    ON

    REP1.BS_BSREF = REP.BS_BSREF AND

    REP1.PD_PDREF = REP.PD_PDREF AND

    (

    REP1.PP_STARTDATE >= '20031129' OR

    REP1.di_startdate >= '20031129'

    )

    WHERE

    REP.BS_BSREF = 318515 AND

    (

    MT.MT_CODE IS NOT NULL OR

    REP1.PD_PDREF IS NOT NULL

    )


    If I see anything else to help I will reply.

  • I guess you're stuck to the exists, because any join could give n-rows because of more than one MT_TABLE qualifies.

    How about this ? (check the execution plan)

    SELECT pd.pd_pdref,

    af.af_pct,

    pd.pd_author,

    pd.pd_barcode,

    bc.bc_name,

    pd.pd_can_cons,

    ca.ca_name,

    CASE WHEN (pd.pd_dsref = 4 AND pd.pd_stock_qty = 0)

     THEN 'Utánrendelheto'

     ELSE st.ds_name

    END as ds_name,

    pd.pd_edition_year,

    pd.pd_has_image,

    pd.pd_hun_title,

    pd.pd_lang,

    pd.pd_nov_exp,

    pd.pd_nov_msg,

    pd.pd_numpages,

    pd.pd_orig_title,

    ad.ad_name,

    py.py_name,

    pd.pd_serial_name,

    pd.pd_serial_num,

    pd.pd_spec_exp,

    pd.pd_spec_msg,

    pd.pd_stock_qty,

    pd.pd_vtref,

    pr.prheight,

    pr.prlngth,

    pr.prwght,

    pr.prwidth,

    pr.prldesc1,

    pr.prnbr,

    REP.PP_RETAILPRICE,

    REP.DI_PCT

    , RepCrit.PD_PDREF

    FROM DB2NC.SB_PRODUCTREPORT REP

    INNER JOIN db2nc.sb_product pd ON REP.pd_pdref = pd.pd_pdref

    INNER JOIN db2nc.sb_afakod af ON pd.pd_afref=af.af_afref

    LEFT JOIN db2nc.sb_binder_code bc ON pd.pd_bcref=bc.bc_bcref

    INNER JOIN db2nc.sb_prod_cat ca ON pd.pd_caref=ca.ca_caref

    INNER JOIN db2nc.sb_prod_stat st ON pd.pd_dsref=st.ds_dsref

    INNER JOIN db2nc.sb_partner pn ON pd.pd_puref=pn.pn_pnref

    INNER JOIN db2nc.sb_address ad ON pn.pn_sarfnbr_hq=ad.ad_adref

    INNER JOIN db2nc.sb_prod_type py ON pd.pd_pyref=py.py_pyref

    INNER JOIN db2nc.product pr ON pd.pd_pdref=pr.prrfnbr

    left join

    (SELECT REP1.PD_PDREF, REP1.BS_BSREF

      FROM DB2NC.SB_PRODUCTREPORT REP1

      AND (REP1.PP_STARTDATE >= '2003-11-29'    --index ?

        OR REP1.di_startdate >= '2003-11-29' )   --index ?

    group by REP1.PD_PDREF, REP1.BS_BSREF

    ) RepCrit

    on RepCrit.BS_BSREF = REP.BS_BSREF

    AND RepCrit.PD_PDREF = REP.PD_PDREF

    WHERE REP.BS_BSREF = 318515

    and ( RepCrit.PD_PDREF is not null

       or exists (select *

      FROM DB2NC.SB_MODIFIED_TABLES

      WHERE  MT_DATE >= '2003-11-29 00:00:00.000'   -- index ?

      --and MT_TABLE in ('PD','AF','BC','CA','ST','PN','AD','PY','PR')  -- index ??

      and  case MT_TABLE

        when 'PD' then  PD.PD_PDREF

        when 'AF' then  AF_AFREF 

        when 'BC' then  BC_BCREF

        when 'CA' then  CA_CAREF

        when 'ST' then  DS_DSREF

        when 'PN' then  PN_PNREF

        when 'AD' then  AD_ADREF

        when 'PY' then  PY_PYREF

        when 'PR' then  PRRFNBR

       end

                   )

         )

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Lastly I have the habit of casting string dates to DateTime.

    AND MT_DATE >= Cast('2003-11-29 00:00:00.000' as DateTime)


    This is implit anyway, so it will make no difference to the plan, but you should always do an explicit cast, usually using convert rather than cast because then you can specify the date format, and this means that you are not dependant on the windows date settings of the server to recognise the string.




    Keith Henry



    According to everyone I know I "do something with computers?" for a living, so there you go.

  • Thanks guys for all the replies.

    I've liked very the solution of Antares and alzdba but those solutions is still doing a table scan on SB_MODIFIED_TABLES.

    As info in the SB_MODIFIED_TABLES there are over 40K rows and in the SB_PRODUCT table over 30K rows.

    So you can imagine how fast can be 40K times a full table scan of 40K rows!

    Gary's solution is using all the indexes but and retrieves the main data for the modified rows but all the columns which contain the data coming from the left joined AND NOT MODIFIED tables are NULL. Which of cours is not correct.

     



    Bye
    Gabor

  • As an additional info a propiety (price, partner info...) could have been modified more then once since the given date.

    We only have to retrieve each product info once, regardles how often andhow many of its propiety it has been changed.



    Bye
    Gabor

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply