Any tips on speeding this SQL up would be most helpful

Question

Any tips on speeding this SQL up would be most helpful

douglas.t

SSC Eights!

Points: 961
More actions
January 3, 2017 at 5:03 pm

#325028

Basic DB Schema (I am using MSSQL 2016 Web Edition)
ub_master
------------------------
ub_master_id (int) (Primary Key)
a_site_id (int)
importversion (int)
ub_batch_sub_number (int)
date_discharged (date)
ub_charge
-------------------------
ub_charge_id (int) (Primary Key)
ub_master_id (int)
a_site_id (int)
charge_code (varchar(10))
charge_amount (decimal)
charge_qty (int)
charge_date (date)
charge_description (varchar(100))
The basic idea is to query some statistics but I want the latest price and description found in the database. I have tried a dozen different ways and this seems to be the fastest I have found so far, but it still takes 5 seconds for 1 years worth of data. I know it is a lot of records but I still wonder if some tips could help it go faster.
I also want to paginate it, but the only way I can see to effectively do that is to have everything run and then limit it.
Here is my query so far:
-- Here I am getting the charges that do not cancel out each other. In the DB I could have a 1 qty and a -1 qty, so I want to eliminate those.
WITH ubc as (
SELECT c.ub_charge_id from ub_master m
JOIN ub_charge c ON c.ub_master_id = m.ub_master_id
AND (
SELECT SUM(csub.charge_qty)
FROM ub_charge csub
WHERE csub.charge_code = c.Charge_code AND
csub.ub_master_id = m.ub_master_id
) > 0
WHERE
m.a_site_id = 2
AND m.importversion = 1
AND m.ub_batch_sub_number = 0
AND m.date_discharged BETWEEN '12/1/2015' AND '11/30/2016'
),
-- Here I am using row_number() to limit to the latest Charge Amount and Description
cte as (
SELECT * FROM (
SELECT
c.ub_master_id,
c.charge_code,
-- I am dividing by the charge_qty so that I get the original price per unit
case when c.charge_qty <> 0 then (c.charge_amount / c.charge_qty) else c.charge_amount END as chargeAmount,
c.charge_description,
c.CPTCode,
c.CPTCode_Description,
row_number() over(partition by c.charge_code order by c.charge_date desc) as rownum
from ub_charge c
WHERE
-- Making sure I am getting the latest positive price and not credits
c.charge_amount >= 0
AND c.ub_charge_id IN (select ub_charge_id from ubc)
) t
where t.rownum = 1
)
--Now fetching the columns and querying the stats that I wanted.
select
cte.charge_code,
cte.charge_description as LatestDescription,
cte.chargeamount as LatestAmount,
(select sum(charge_qty) from ub_charge c
WHERE c.charge_code = cte.charge_code
AND c.ub_charge_id IN (select ub_charge_id from ubc)
) SumQty,
(select sum(charge_amount) from ub_charge c
WHERE c.charge_code = cte.charge_code
AND c.ub_charge_id IN (select ub_charge_id from ubc)
) TotalRevenue
,
(
select count( t.charge_code ) from (
select distinct ub_master_id, charge_code from ub_charge
WHERE ub_charge_id IN (select ub_charge_id from ubc)
AND charge_code = cte.charge_code
) t
) CountByClaim
from cte
-- Trying to paginate
ORDER BY cte.charge_code
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
As far as I can see the execution plan is hitting indexes and nothing shows me how to speed it up that I can find.

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Can you post the table definitions, index definitions and execution plan please?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

You're reading the base tables quite a few times. You should be able to extract your result set on a single read of the two tables, from what I can see. Some conditional SUMs on this will be close:

SELECT

d.ub_master_id,

d.charge_code,

-- I am dividing by the charge_qty so that I get the original price per unit

d.charge_amount / ISNULL(NULLIF(d.charge_qty,0),1) as chargeAmount,

d.charge_description,

d.CPTCode,

d.CPTCode_Description,

row_number() over(partition by d.charge_code order by d.charge_date desc) as rownum

FROM (

SELECT

c.ub_master_id,

c.charge_code,

c.charge_date,

c.ub_charge_id,

c.charge_description,

c.CPTCode,

c.CPTCode_Description,

c.charge_amount,

c.charge_qty,

cancel = SUM(c.charge_qty) OVER(PARTITION BY c.Charge_code, c.ub_master_id)

FROM ub_master m

JOIN ub_charge c

ON c.ub_master_id = m.ub_master_id

WHERE m.a_site_id = 2

AND m.importversion = 1

AND m.ub_batch_sub_number = 0

AND m.date_discharged BETWEEN '12/1/2015' AND '11/30/2016'

) d

WHERE d.cancel > 0

AND d.charge_amount >= 0

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

douglas.t SSC Eights! Points: 961 More actions · Answer 3

I created some test tables and data and I found my SQL wasn't giving me what I wanted.

UB_MASTER_TEST TABLE

CREATE TABLE dbo.ub_master_test (

ub_master_id int NOT NULL,

ub_batch_sub_number int NULL,

a_site_id int NULL,

importVersion int DEFAULT 0 NULL,

date_discharged datetime NULL,

CONSTRAINT PK_ub_master_test PRIMARY KEY CLUSTERED (ub_master_id)

WITH (

PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF, STATISTICS_NORECOMPUTE = OFF,

ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

)

ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX ub_master_test_idx ON dbo.ub_master_test

(a_site_id)

INCLUDE (ub_batch_sub_number, importVersion)

WITH (

PAD_INDEX = OFF,

DROP_EXISTING = OFF,

STATISTICS_NORECOMPUTE = OFF,

SORT_IN_TEMPDB = OFF,

ONLINE = OFF,

ALLOW_ROW_LOCKS = ON,

ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX ub_master_test_idx ON dbo.ub_master_test

(a_site_id)

INCLUDE (ub_batch_sub_number, importVersion)

WITH (

PAD_INDEX = OFF,

DROP_EXISTING = OFF,

STATISTICS_NORECOMPUTE = OFF,

SORT_IN_TEMPDB = OFF,

ONLINE = OFF,

ALLOW_ROW_LOCKS = ON,

ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

GO

UB_CHARGE_TEST TABLE

CREATE TABLE dbo.ub_charge_test (

ub_charge_id int NOT NULL,

a_site_id int NULL,

ub_master_id int NULL,

charge_code char(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

charge_amount decimal(20, 2) NULL,

charge_description varchar(255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

charge_date datetime NULL,

charge_qty int NULL,

CPTCode varchar(8) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

CPTCode_Description varchar(180) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

CONSTRAINT PK_ub_charges_test PRIMARY KEY CLUSTERED (ub_charge_id)

WITH (

PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF, STATISTICS_NORECOMPUTE = OFF,

ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON),

CONSTRAINT ub_charge_test_fk FOREIGN KEY (ub_master_id)

REFERENCES dbo.ub_master_test (ub_master_id)

ON UPDATE NO ACTION

ON DELETE NO ACTION

)

ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX idx_chargecode_include_ubMasterID ON dbo.ub_charge_test

(charge_code)

INCLUDE (ub_master_id)

WITH (

PAD_INDEX = OFF,

DROP_EXISTING = OFF,

STATISTICS_NORECOMPUTE = OFF,

SORT_IN_TEMPDB = OFF,

ONLINE = OFF,

ALLOW_ROW_LOCKS = ON,

ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX ub_charge_test_idx ON dbo.ub_charge_test

(ub_master_id)

WITH (

PAD_INDEX = OFF,

DROP_EXISTING = OFF,

STATISTICS_NORECOMPUTE = OFF,

SORT_IN_TEMPDB = OFF,

ONLINE = OFF,

ALLOW_ROW_LOCKS = ON,

ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

GO

Now my SQL to populate the DB with test data

DELETE FROM ub_charge_test

DELETE FROM ub_master_test

INSERT INTO ub_master_test

(ub_master_id, a_site_id, importversion, ub_batch_sub_number, date_discharged)

VALUES

(1, 5, 1, 0, '12/20/2016'),

(2, 3, 1, 0, '12/21/2016'),

(3, 3, 1, 1, '01/02/2017'),

(4, 3, 2, 0, '01/03/2017'),

(5, 3, 1, 0, '01/01/2017'),

(6, 3, 1, 0, '01/03/2017');

INSERT INTO ub_charge_test

(ub_charge_id, ub_master_id, a_site_id, charge_code, charge_amount, charge_qty, charge_date, charge_description, cptcode, cptcode_description)

VALUES

(1, 1, 5, '1000', 50, 2, '01/3/2017', 'PRODUCT 1000 V1', 'code1', 'cpt description 1'),

(2, 1, 5, '2000', 100, 1, '12/17/2016', 'PRODUCT 2000 V1', 'code2', 'cpt description 2'),

(3, 2, 3, '3000', 500, 1, '12/20/2016', 'PRODUCT 3000 V1', 'code3', 'cpt description 3'),

(4, 2, 3, '2000', 100, 1, '12/19/2016', 'PRODUCT 2000 V2', 'code4', 'cpt description 4'),

(5, 3, 3, '1000', 30, 1, '1/2/2017', 'PRODUCT 1000 V2', 'code5', 'cpt description 5'),

(6, 3, 3, '2000', 210, 2, '1/2/2017', 'PRODUCT 2000 V3', 'code6', 'cpt description 6'),

(7, 4, 3, '3000', 1000, 2, '12/21/2016', 'PRODUCT 3000 V2', 'code7', 'cpt description 7'),

(8, 4, 3, '4000', 10, 2, '12/20/2016', 'PRODUCT 4000 V1', 'code8', 'cpt description 8'),

(9, 5, 3, '1000', 100, 4, '11/19/2016', 'PRODUCT 1000 V3', 'code9', 'cpt description 9'),

(10, 5, 3, '2000', 220, 2, '11/20/2016', 'PRODUCT 2000 V4', 'code10', 'cpt description 10'),

(11, 1, 5, '5000', 90, 2, '12/17/2016', 'PRODUCT 5000 V1', 'code11', 'cpt description 11'),

(12, 5, 3, '2000', 220, 1, '11/20/2016', 'PRODUCT 2000 V5', 'code12', 'cpt description 12'),

(13, 6, 3, '1000', 100, 1, '01/01/2017', 'PRODUCT 1000 V4', 'code13', 'cpt description 13'),

(14, 6, 3, '1000', 100, 1, '01/01/2017', 'PRODUCT 1000 V5', 'code14', 'cpt description 14');

Now if I have the following limiting criteria:

a_site_id = 3

importVersion = 1

ub_batch_sub_number = 0

date_discharged = BETWEEN '1/1/2017' AND '1/5/2017'

Then that should eliminate ub_master_id of (1,2,3,4) leaving only ub_master_id:5

Which would only pull the Charge_Codes of (1000 & 2000)

So this is the result I want returned:

charge_codeLatestDescriptionLatestAmountSumQtyTotalRevenueCountByClaimCPTCodeCPTCode_Description

1000PRODUCT 1000 V2$305$2002code5cpt description 5

2000PRODUCT 2000 V3$1053$4401code6cpt description 6

You will notice that "PRODUCT 1000 V2" is returned because it is the latest "charge_date" description, amount, cptcode, cptcode_description in the a_site_id = 3

Charge_code: Only list those that come up with the filtering criteria
LatestDescription: Look for the latest charge_description within a_site_id = 3 using charge_date.
LatestAmount: Look for the latest (charge_amount / charge_qty) within a_site_id = 3 using charge_date to get the price per one.
SumQty: Sum the charge_qty of only those that were found with the filtering criteria
TotalRevenue: Sum the charge_amount of only those that were found with the filtering criteria
CountByClaim: Count of the charge_code, but only count once per ub_master_id. (So in my test data 1000 is found 3 times, but I only want 2 as the count because it is only in two claims)
cptcode: Look for the latest cptcode within a_site_id = 3 using charge_date.
cptcode_description: Look for the latest cptcode_description within a_site_id = 3 using charge_date.

Here is my latest working SQL against the test data but I would like to make it more efficient and optimized. Please help!

WITH ubc as (

SELECT c.ub_charge_id, c.charge_code from ub_master_test m

JOIN ub_charge_test c ON c.ub_master_id = m.ub_master_id

AND (

SELECT SUM(csub.charge_qty)

FROM ub_charge_test csub

WHERE csub.charge_code = c.Charge_code AND

csub.ub_master_id = m.ub_master_id

) > 0

WHERE

m.a_site_id = 3

AND m.importversion = 1

AND m.ub_batch_sub_number = 0

AND m.date_discharged BETWEEN '1/1/2017' AND '1/4/2017'

),

cte as (

SELECT * FROM (

SELECT

c.ub_master_id,

c.charge_code,

case when c.charge_qty <> 0 then (c.charge_amount / c.charge_qty) else c.charge_amount END as chargeAmount,

c.charge_description,

c.CPTCode,

c.CPTCode_Description,

row_number() over(partition by c.charge_code order by c.charge_date desc) as rownum

from ub_charge_test c

WHERE

c.charge_amount >= 0

AND c.a_site_id = 3

AND charge_code IN (Select charge_code from ubc)

) t

where t.rownum = 1

)

select

cte.charge_code,

cte.charge_description as LatestDescription,

cte.chargeamount as LatestAmount,

(select sum(charge_qty) from ub_charge_test c

WHERE c.charge_code = cte.charge_code

AND c.ub_charge_id IN (select ub_charge_id from ubc)

) SumQty,

(select sum(charge_amount) from ub_charge_test c

WHERE c.charge_code = cte.charge_code

AND c.ub_charge_id IN (select ub_charge_id from ubc)

) TotalRevenue

,

(

select count( t.charge_code ) from (

select distinct ub_master_id, charge_code from ub_charge_test

WHERE ub_charge_id IN (select ub_charge_id from ubc)

AND charge_code = cte.charge_code

) t

) CountByClaim,

cte.CPTCode,

cte.CPTCode_Description

from cte

ORDER BY cte.charge_code

drew.allen SSC Guru Points: 76944 More actions · Answer 4

Both of these tables contain a foreign key to a table that you did not provide a definition for, so they will not run as written. You really need to test your code before posting it.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

douglas.t SSC Eights! Points: 961 More actions · Answer 5

Sorry about that, I modified the SQL just removing the foreign key, you don't need it for the purpose of this query.

drew.allen SSC Guru Points: 76944 More actions · Answer 6

I've looked at this, and I have lots of questions. Specifically, it looks like a_site_id should be part of the primary key, but it is not used in any of the joins.

I did come up with the following, which the execution plan says will be cheaper, but the run times on this small amount of data is inconclusive.

;

WITH charges AS (

SELECT

c.a_site_id,

c.charge_code,

SUM(c.charge_qty) AS SumQty,

SUM(c.charge_amount) AS TotalRevenue,

COUNT(DISTINCT m.ub_master_id) AS CountByClaim

from #ub_master_test m

JOIN #ub_charge_test c ON c.ub_master_id = m.ub_master_id

WHERE

m.a_site_id = 3

AND m.importversion = 1

AND m.ub_batch_sub_number = 0

AND m.date_discharged BETWEEN '1/1/2017' AND '1/4/2017'

GROUP BY c.a_site_id, c.charge_code

)

SELECT c.charge_code,

l.charge_description AS LatestDescription,

l.charge_amount AS LatestAmount,

c.SumQty,

c.TotalRevenue,

c.CountByClaim,

l.CPTCode,

l.CPTCode_Description

FROM charges c

CROSS APPLY (

SELECT TOP 1

sub.charge_qty,

case when sub.charge_qty <> 0 then (sub.charge_amount / sub.charge_qty) else sub.charge_amount END as charge_amount,

sub.charge_description,

sub.CPTCode,

sub.CPTCode_Description

FROM #ub_charge_test sub

WHERE sub.a_site_id = c.a_site_id

AND sub.charge_code = c.charge_code

ORDER BY sub.charge_date DESC

) l

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

douglas.t SSC Eights! Points: 961 More actions · Answer 7

Thanks for the effort. Your code is taking longer than mine though on a year's data. I don't know how best to export and post the execution plans on this forum.

I am also open to ideas regarding a "View" or "UDF". I haven't done anything with that yet and was hoping on tips from those that are experienced. I don't want to do something that will slow down my inserts much because I insert a lot of data constantly.

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 8

How does this one fare? It should generate a different plan from the one I have posted. Those scans are going to be expensive, considering you have a year's data. Covering indexes may help, but I don't recommend unless you are very particular about speeding up the query.

Can you please post execution plan once you run it? You could save it as .sqlplan and add it to your G drive and share may be?

DECLARE @site_id INT = 3,

@importversion INT = 1,

@batchsubnumber INT = 0,

@datefrom DATETIME = '1jan2017',

@dateto DATETIME = '1may2017'

;WITH cteChargeDesc

AS

(

SELECT charge_code, charge_description, charge_amount, charge_qty, cptcode, cptcode_description,

ROW_NUMBER() OVER(PARTITION BY charge_code ORDER BY charge_date DESC) rn

FROM dbo.ub_charge_test

WHERE a_site_id = @site_id

)

,cteMaster AS

(

SELECT ub_master_id

FROM dbo.ub_master_test

WHERE a_site_id = @site_id AND importVersion = @importversion AND ub_batch_sub_number = @batchsubnumber AND date_discharged BETWEEN @datefrom AND @dateto

)

SELECT ct.charge_code, cd.charge_description, cd.latest_amount, SUM(ct.charge_qty) [sum_qty], SUM(ct.charge_amount) [total_revenue],

COUNT(DISTINCT ct.ub_master_id) [count_by_claim], cd.CPTCode, cd.CPTCode_Description

FROM dbo.ub_charge_test ct

INNER JOIN cteMaster m

ON m.ub_master_id = ct.ub_master_id

OUTER APPLY

(

SELECT icd.charge_description, icd.charge_amount / icd.charge_qty [latest_amount], icd.cptcode, icd.cptcode_description

FROM cteChargeDesc icd

WHERE icd.charge_code = ct.charge_code AND icd.rn = 1

) cd

GROUP BY ct.charge_code, cd.charge_description, cd.latest_amount, cd.CPTCode, cd.CPTCode_Description

https://sqlroadie.com/

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

douglas.t (1/5/2017)
Thanks for the effort. Your code is taking longer than mine though on a year's data. I don't know how best to export and post the execution plans on this forum.
I am also open to ideas regarding a "View" or "UDF". I haven't done anything with that yet and was hoping on tips from those that are experienced. I don't want to do something that will slow down my inserts much because I insert a lot of data constantly.

There are two key differences between Drew's code and the original. Firstly, Drew's query touches each table the minimum number of times. Secondly, it's far more amenable to improvement because of it's simplicity. Providing it returns the correct results, I'd recommend you ditch the original and focus on tuning Drew's instead. You should post up the actual execution plan (not the estimated plan) as a .sqlplan attachment, of the query running against your "year's data".This article [/url]describes how.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

douglas.t SSC Eights! Points: 961 More actions · Answer 10

I have attached 3 sqlplan files. The query that I have (douglas.sqlplan), the one drew.allen created (drew.sqlplan) and the sql by Arjun (arjun.sqlplan).

My query returns: 2540 records (Execution time: 3.532 sec; total time: 5.047 sec)

Drew returns: 2570 records (Execution time: 8.906 sec; total time: 11 sec)

Arjun returns: 2570 records (Execution time: 2.984 sec; total time: 5.172 sec)

I am not sure why mine is getting less in the record count because it gets the same result on the test data. But I will trust in a more optimized and simpler query as suggested.

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 11

Douglas, can you also post the execution times please? I think I know why my query is returning more rows.

https://sqlroadie.com/

douglas.t SSC Eights! Points: 961 More actions · Answer 12

I don't know if you get notified if I just edit. But I have edited the post above with the execution times.

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 13

No notifications on edit mate. What's the difference between execution time and total time?

https://sqlroadie.com/

douglas.t SSC Eights! Points: 961 More actions · Answer 14

I don't know, it is what comes back on my EMS SQL Manager for SQL Server when I run a query. That is why I posted both. SQL Manager only shows the seconds and doesn't show milliseconds as far as I could see.