Any tips on speeding this SQL up would be most helpful

  • Basic DB Schema (I am using MSSQL 2016 Web Edition)

    ub_master

    ------------------------

    ub_master_id (int) (Primary Key)

    a_site_id (int)

    importversion (int)

    ub_batch_sub_number (int)

    date_discharged (date)

    ub_charge

    -------------------------

    ub_charge_id (int) (Primary Key)

    ub_master_id (int)

    a_site_id (int)

    charge_code (varchar(10))

    charge_amount (decimal)

    charge_qty (int)

    charge_date (date)

    charge_description (varchar(100))

    The basic idea is to query some statistics but I want the latest price and description found in the database. I have tried a dozen different ways and this seems to be the fastest I have found so far, but it still takes 5 seconds for 1 years worth of data. I know it is a lot of records but I still wonder if some tips could help it go faster.

    I also want to paginate it, but the only way I can see to effectively do that is to have everything run and then limit it.

    Here is my query so far:

    -- Here I am getting the charges that do not cancel out each other. In the DB I could have a 1 qty and a -1 qty, so I want to eliminate those.

    WITH ubc as (

    SELECT c.ub_charge_id from ub_master m

    JOIN ub_charge c ON c.ub_master_id = m.ub_master_id

    AND (

    SELECT SUM(csub.charge_qty)

    FROM ub_charge csub

    WHERE csub.charge_code = c.Charge_code AND

    csub.ub_master_id = m.ub_master_id

    ) > 0

    WHERE

    m.a_site_id = 2

    AND m.importversion = 1

    AND m.ub_batch_sub_number = 0

    AND m.date_discharged BETWEEN '12/1/2015' AND '11/30/2016'

    ),

    -- Here I am using row_number() to limit to the latest Charge Amount and Description

    cte as (

    SELECT * FROM (

    SELECT

    c.ub_master_id,

    c.charge_code,

    -- I am dividing by the charge_qty so that I get the original price per unit

    case when c.charge_qty <> 0 then (c.charge_amount / c.charge_qty) else c.charge_amount END as chargeAmount,

    c.charge_description,

    c.CPTCode,

    c.CPTCode_Description,

    row_number() over(partition by c.charge_code order by c.charge_date desc) as rownum

    from ub_charge c

    WHERE

    -- Making sure I am getting the latest positive price and not credits

    c.charge_amount >= 0

    AND c.ub_charge_id IN (select ub_charge_id from ubc)

    ) t

    where t.rownum = 1

    )

    --Now fetching the columns and querying the stats that I wanted.

    select

    cte.charge_code,

    cte.charge_description as LatestDescription,

    cte.chargeamount as LatestAmount,

    (select sum(charge_qty) from ub_charge c

    WHERE c.charge_code = cte.charge_code

    AND c.ub_charge_id IN (select ub_charge_id from ubc)

    ) SumQty,

    (select sum(charge_amount) from ub_charge c

    WHERE c.charge_code = cte.charge_code

    AND c.ub_charge_id IN (select ub_charge_id from ubc)

    ) TotalRevenue

    ,

    (

    select count( t.charge_code ) from (

    select distinct ub_master_id, charge_code from ub_charge

    WHERE ub_charge_id IN (select ub_charge_id from ubc)

    AND charge_code = cte.charge_code

    ) t

    ) CountByClaim

    from cte

    -- Trying to paginate

    ORDER BY cte.charge_code

    OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY

    As far as I can see the execution plan is hitting indexes and nothing shows me how to speed it up that I can find.

  • Can you post the table definitions, index definitions and execution plan please?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You're reading the base tables quite a few times. You should be able to extract your result set on a single read of the two tables, from what I can see. Some conditional SUMs on this will be close:

    SELECT

    d.ub_master_id,

    d.charge_code,

    -- I am dividing by the charge_qty so that I get the original price per unit

    d.charge_amount / ISNULL(NULLIF(d.charge_qty,0),1) as chargeAmount,

    d.charge_description,

    d.CPTCode,

    d.CPTCode_Description,

    row_number() over(partition by d.charge_code order by d.charge_date desc) as rownum

    FROM (

    SELECT

    c.ub_master_id,

    c.charge_code,

    c.charge_date,

    c.ub_charge_id,

    c.charge_description,

    c.CPTCode,

    c.CPTCode_Description,

    c.charge_amount,

    c.charge_qty,

    cancel = SUM(c.charge_qty) OVER(PARTITION BY c.Charge_code, c.ub_master_id)

    FROM ub_master m

    JOIN ub_charge c

    ON c.ub_master_id = m.ub_master_id

    WHERE m.a_site_id = 2

    AND m.importversion = 1

    AND m.ub_batch_sub_number = 0

    AND m.date_discharged BETWEEN '12/1/2015' AND '11/30/2016'

    ) d

    WHERE d.cancel > 0

    AND d.charge_amount >= 0

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I created some test tables and data and I found my SQL wasn't giving me what I wanted.

    UB_MASTER_TEST TABLE

    CREATE TABLE dbo.ub_master_test (

    ub_master_id int NOT NULL,

    ub_batch_sub_number int NULL,

    a_site_id int NULL,

    importVersion int DEFAULT 0 NULL,

    date_discharged datetime NULL,

    CONSTRAINT PK_ub_master_test PRIMARY KEY CLUSTERED (ub_master_id)

    WITH (

    PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF, STATISTICS_NORECOMPUTE = OFF,

    ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

    )

    ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX ub_master_test_idx ON dbo.ub_master_test

    (a_site_id)

    INCLUDE (ub_batch_sub_number, importVersion)

    WITH (

    PAD_INDEX = OFF,

    DROP_EXISTING = OFF,

    STATISTICS_NORECOMPUTE = OFF,

    SORT_IN_TEMPDB = OFF,

    ONLINE = OFF,

    ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON)

    ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX ub_master_test_idx ON dbo.ub_master_test

    (a_site_id)

    INCLUDE (ub_batch_sub_number, importVersion)

    WITH (

    PAD_INDEX = OFF,

    DROP_EXISTING = OFF,

    STATISTICS_NORECOMPUTE = OFF,

    SORT_IN_TEMPDB = OFF,

    ONLINE = OFF,

    ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON)

    ON [PRIMARY]

    GO

    UB_CHARGE_TEST TABLE

    CREATE TABLE dbo.ub_charge_test (

    ub_charge_id int NOT NULL,

    a_site_id int NULL,

    ub_master_id int NULL,

    charge_code char(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    charge_amount decimal(20, 2) NULL,

    charge_description varchar(255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    charge_date datetime NULL,

    charge_qty int NULL,

    CPTCode varchar(8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    CPTCode_Description varchar(180) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    CONSTRAINT PK_ub_charges_test PRIMARY KEY CLUSTERED (ub_charge_id)

    WITH (

    PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF, STATISTICS_NORECOMPUTE = OFF,

    ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON),

    CONSTRAINT ub_charge_test_fk FOREIGN KEY (ub_master_id)

    REFERENCES dbo.ub_master_test (ub_master_id)

    ON UPDATE NO ACTION

    ON DELETE NO ACTION

    )

    ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX idx_chargecode_include_ubMasterID ON dbo.ub_charge_test

    (charge_code)

    INCLUDE (ub_master_id)

    WITH (

    PAD_INDEX = OFF,

    DROP_EXISTING = OFF,

    STATISTICS_NORECOMPUTE = OFF,

    SORT_IN_TEMPDB = OFF,

    ONLINE = OFF,

    ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON)

    ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX ub_charge_test_idx ON dbo.ub_charge_test

    (ub_master_id)

    WITH (

    PAD_INDEX = OFF,

    DROP_EXISTING = OFF,

    STATISTICS_NORECOMPUTE = OFF,

    SORT_IN_TEMPDB = OFF,

    ONLINE = OFF,

    ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON)

    ON [PRIMARY]

    GO

    Now my SQL to populate the DB with test data

    DELETE FROM ub_charge_test

    DELETE FROM ub_master_test

    INSERT INTO ub_master_test

    (ub_master_id, a_site_id, importversion, ub_batch_sub_number, date_discharged)

    VALUES

    (1, 5, 1, 0, '12/20/2016'),

    (2, 3, 1, 0, '12/21/2016'),

    (3, 3, 1, 1, '01/02/2017'),

    (4, 3, 2, 0, '01/03/2017'),

    (5, 3, 1, 0, '01/01/2017'),

    (6, 3, 1, 0, '01/03/2017');

    INSERT INTO ub_charge_test

    (ub_charge_id, ub_master_id, a_site_id, charge_code, charge_amount, charge_qty, charge_date, charge_description, cptcode, cptcode_description)

    VALUES

    (1, 1, 5, '1000', 50, 2, '01/3/2017', 'PRODUCT 1000 V1', 'code1', 'cpt description 1'),

    (2, 1, 5, '2000', 100, 1, '12/17/2016', 'PRODUCT 2000 V1', 'code2', 'cpt description 2'),

    (3, 2, 3, '3000', 500, 1, '12/20/2016', 'PRODUCT 3000 V1', 'code3', 'cpt description 3'),

    (4, 2, 3, '2000', 100, 1, '12/19/2016', 'PRODUCT 2000 V2', 'code4', 'cpt description 4'),

    (5, 3, 3, '1000', 30, 1, '1/2/2017', 'PRODUCT 1000 V2', 'code5', 'cpt description 5'),

    (6, 3, 3, '2000', 210, 2, '1/2/2017', 'PRODUCT 2000 V3', 'code6', 'cpt description 6'),

    (7, 4, 3, '3000', 1000, 2, '12/21/2016', 'PRODUCT 3000 V2', 'code7', 'cpt description 7'),

    (8, 4, 3, '4000', 10, 2, '12/20/2016', 'PRODUCT 4000 V1', 'code8', 'cpt description 8'),

    (9, 5, 3, '1000', 100, 4, '11/19/2016', 'PRODUCT 1000 V3', 'code9', 'cpt description 9'),

    (10, 5, 3, '2000', 220, 2, '11/20/2016', 'PRODUCT 2000 V4', 'code10', 'cpt description 10'),

    (11, 1, 5, '5000', 90, 2, '12/17/2016', 'PRODUCT 5000 V1', 'code11', 'cpt description 11'),

    (12, 5, 3, '2000', 220, 1, '11/20/2016', 'PRODUCT 2000 V5', 'code12', 'cpt description 12'),

    (13, 6, 3, '1000', 100, 1, '01/01/2017', 'PRODUCT 1000 V4', 'code13', 'cpt description 13'),

    (14, 6, 3, '1000', 100, 1, '01/01/2017', 'PRODUCT 1000 V5', 'code14', 'cpt description 14');

    Now if I have the following limiting criteria:

    a_site_id = 3

    importVersion = 1

    ub_batch_sub_number = 0

    date_discharged = BETWEEN '1/1/2017' AND '1/5/2017'

    Then that should eliminate ub_master_id of (1,2,3,4) leaving only ub_master_id:5

    Which would only pull the Charge_Codes of (1000 & 2000)

    So this is the result I want returned:

    charge_codeLatestDescriptionLatestAmountSumQtyTotalRevenueCountByClaimCPTCodeCPTCode_Description

    1000PRODUCT 1000 V2$305$2002code5cpt description 5

    2000PRODUCT 2000 V3$1053$4401code6cpt description 6

    You will notice that "PRODUCT 1000 V2" is returned because it is the latest "charge_date" description, amount, cptcode, cptcode_description in the a_site_id = 3

    • Charge_code: Only list those that come up with the filtering criteria
    • LatestDescription: Look for the latest charge_description within a_site_id = 3 using charge_date.
    • LatestAmount: Look for the latest (charge_amount / charge_qty) within a_site_id = 3 using charge_date to get the price per one.
    • SumQty: Sum the charge_qty of only those that were found with the filtering criteria
    • TotalRevenue: Sum the charge_amount of only those that were found with the filtering criteria
    • CountByClaim: Count of the charge_code, but only count once per ub_master_id. (So in my test data 1000 is found 3 times, but I only want 2 as the count because it is only in two claims)
    • cptcode: Look for the latest cptcode within a_site_id = 3 using charge_date.
    • cptcode_description: Look for the latest cptcode_description within a_site_id = 3 using charge_date.

    Here is my latest working SQL against the test data but I would like to make it more efficient and optimized. Please help!

    WITH ubc as (

    SELECT c.ub_charge_id, c.charge_code from ub_master_test m

    JOIN ub_charge_test c ON c.ub_master_id = m.ub_master_id

    AND (

    SELECT SUM(csub.charge_qty)

    FROM ub_charge_test csub

    WHERE csub.charge_code = c.Charge_code AND

    csub.ub_master_id = m.ub_master_id

    ) > 0

    WHERE

    m.a_site_id = 3

    AND m.importversion = 1

    AND m.ub_batch_sub_number = 0

    AND m.date_discharged BETWEEN '1/1/2017' AND '1/4/2017'

    ),

    cte as (

    SELECT * FROM (

    SELECT

    c.ub_master_id,

    c.charge_code,

    case when c.charge_qty <> 0 then (c.charge_amount / c.charge_qty) else c.charge_amount END as chargeAmount,

    c.charge_description,

    c.CPTCode,

    c.CPTCode_Description,

    row_number() over(partition by c.charge_code order by c.charge_date desc) as rownum

    from ub_charge_test c

    WHERE

    c.charge_amount >= 0

    AND c.a_site_id = 3

    AND charge_code IN (Select charge_code from ubc)

    ) t

    where t.rownum = 1

    )

    select

    cte.charge_code,

    cte.charge_description as LatestDescription,

    cte.chargeamount as LatestAmount,

    (select sum(charge_qty) from ub_charge_test c

    WHERE c.charge_code = cte.charge_code

    AND c.ub_charge_id IN (select ub_charge_id from ubc)

    ) SumQty,

    (select sum(charge_amount) from ub_charge_test c

    WHERE c.charge_code = cte.charge_code

    AND c.ub_charge_id IN (select ub_charge_id from ubc)

    ) TotalRevenue

    ,

    (

    select count( t.charge_code ) from (

    select distinct ub_master_id, charge_code from ub_charge_test

    WHERE ub_charge_id IN (select ub_charge_id from ubc)

    AND charge_code = cte.charge_code

    ) t

    ) CountByClaim,

    cte.CPTCode,

    cte.CPTCode_Description

    from cte

    ORDER BY cte.charge_code

  • Both of these tables contain a foreign key to a table that you did not provide a definition for, so they will not run as written. You really need to test your code before posting it.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Sorry about that, I modified the SQL just removing the foreign key, you don't need it for the purpose of this query.

  • I've looked at this, and I have lots of questions. Specifically, it looks like a_site_id should be part of the primary key, but it is not used in any of the joins.

    I did come up with the following, which the execution plan says will be cheaper, but the run times on this small amount of data is inconclusive.

    ;

    WITH charges AS (

    SELECT

    c.a_site_id,

    c.charge_code,

    SUM(c.charge_qty) AS SumQty,

    SUM(c.charge_amount) AS TotalRevenue,

    COUNT(DISTINCT m.ub_master_id) AS CountByClaim

    from #ub_master_test m

    JOIN #ub_charge_test c ON c.ub_master_id = m.ub_master_id

    WHERE

    m.a_site_id = 3

    AND m.importversion = 1

    AND m.ub_batch_sub_number = 0

    AND m.date_discharged BETWEEN '1/1/2017' AND '1/4/2017'

    GROUP BY c.a_site_id, c.charge_code

    )

    SELECT c.charge_code,

    l.charge_description AS LatestDescription,

    l.charge_amount AS LatestAmount,

    c.SumQty,

    c.TotalRevenue,

    c.CountByClaim,

    l.CPTCode,

    l.CPTCode_Description

    FROM charges c

    CROSS APPLY (

    SELECT TOP 1

    sub.charge_qty,

    case when sub.charge_qty <> 0 then (sub.charge_amount / sub.charge_qty) else sub.charge_amount END as charge_amount,

    sub.charge_description,

    sub.CPTCode,

    sub.CPTCode_Description

    FROM #ub_charge_test sub

    WHERE sub.a_site_id = c.a_site_id

    AND sub.charge_code = c.charge_code

    ORDER BY sub.charge_date DESC

    ) l

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks for the effort. Your code is taking longer than mine though on a year's data. I don't know how best to export and post the execution plans on this forum.

    I am also open to ideas regarding a "View" or "UDF". I haven't done anything with that yet and was hoping on tips from those that are experienced. I don't want to do something that will slow down my inserts much because I insert a lot of data constantly.

  • How does this one fare? It should generate a different plan from the one I have posted. Those scans are going to be expensive, considering you have a year's data. Covering indexes may help, but I don't recommend unless you are very particular about speeding up the query.

    Can you please post execution plan once you run it? You could save it as .sqlplan and add it to your G drive and share may be?

    DECLARE @site_id INT = 3,

    @importversion INT = 1,

    @batchsubnumber INT = 0,

    @datefrom DATETIME = '1jan2017',

    @dateto DATETIME = '1may2017'

    ;WITH cteChargeDesc

    AS

    (

    SELECT charge_code, charge_description, charge_amount, charge_qty, cptcode, cptcode_description,

    ROW_NUMBER() OVER(PARTITION BY charge_code ORDER BY charge_date DESC) rn

    FROM dbo.ub_charge_test

    WHERE a_site_id = @site_id

    )

    ,cteMaster AS

    (

    SELECT ub_master_id

    FROM dbo.ub_master_test

    WHERE a_site_id = @site_id AND importVersion = @importversion AND ub_batch_sub_number = @batchsubnumber AND date_discharged BETWEEN @datefrom AND @dateto

    )

    SELECT ct.charge_code, cd.charge_description, cd.latest_amount, SUM(ct.charge_qty) [sum_qty], SUM(ct.charge_amount) [total_revenue],

    COUNT(DISTINCT ct.ub_master_id) [count_by_claim], cd.CPTCode, cd.CPTCode_Description

    FROM dbo.ub_charge_test ct

    INNER JOIN cteMaster m

    ON m.ub_master_id = ct.ub_master_id

    OUTER APPLY

    (

    SELECT icd.charge_description, icd.charge_amount / icd.charge_qty [latest_amount], icd.cptcode, icd.cptcode_description

    FROM cteChargeDesc icd

    WHERE icd.charge_code = ct.charge_code AND icd.rn = 1

    ) cd

    GROUP BY ct.charge_code, cd.charge_description, cd.latest_amount, cd.CPTCode, cd.CPTCode_Description

    https://sqlroadie.com/

  • douglas.t (1/5/2017)


    Thanks for the effort. Your code is taking longer than mine though on a year's data. I don't know how best to export and post the execution plans on this forum.

    I am also open to ideas regarding a "View" or "UDF". I haven't done anything with that yet and was hoping on tips from those that are experienced. I don't want to do something that will slow down my inserts much because I insert a lot of data constantly.

    There are two key differences between Drew's code and the original. Firstly, Drew's query touches each table the minimum number of times. Secondly, it's far more amenable to improvement because of it's simplicity. Providing it returns the correct results, I'd recommend you ditch the original and focus on tuning Drew's instead. You should post up the actual execution plan (not the estimated plan) as a .sqlplan attachment, of the query running against your "year's data".This article [/url]describes how.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I have attached 3 sqlplan files. The query that I have (douglas.sqlplan), the one drew.allen created (drew.sqlplan) and the sql by Arjun (arjun.sqlplan).

    My query returns: 2540 records (Execution time: 3.532 sec; total time: 5.047 sec)

    Drew returns: 2570 records (Execution time: 8.906 sec; total time: 11 sec)

    Arjun returns: 2570 records (Execution time: 2.984 sec; total time: 5.172 sec)

    I am not sure why mine is getting less in the record count because it gets the same result on the test data. But I will trust in a more optimized and simpler query as suggested.

  • Douglas, can you also post the execution times please? I think I know why my query is returning more rows.

    https://sqlroadie.com/

  • I don't know if you get notified if I just edit. But I have edited the post above with the execution times.

  • No notifications on edit mate. What's the difference between execution time and total time?

    https://sqlroadie.com/

  • I don't know, it is what comes back on my EMS SQL Manager for SQL Server when I run a query. That is why I posted both. SQL Manager only shows the seconds and doesn't show milliseconds as far as I could see.

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply