SELECT Slow in production

Question

Post reply

SELECT Slow in production

ian 29141

SSC Enthusiast

Points: 118
More actions
April 21, 2020 at 7:59 pm
Go to Answer
#3745122

I'm self-trained in SQL as a part of my job and trying to do remote support of a system in a developing country, i.e. no other support. Their application has a procedure in SQL Server 2014 - Express that runs for 30 minutes once a day to select about 4000 rows from a 27,000,000 row table. On my notebook it runs in 30 seconds with a table of 2,300,000 rows. This piece of the procedure by itself takes almost all of the 30 minutes execution time.
DECLARE @DateFrom AS DATETIME;
Set @DateFrom = (select getdate()-2day);
DECLARE @DateTo AS DATETIME;
Set @DateTo = (select getdate()+1day);
INSERT into dmhdb.dbo.CalcEvap (Level_id, Monitoring_datetime, A)
(Select Level_id, Monitoring_datetime, VALUE_NUMERIC from dmhdb.dbo.Data Where VARIABLE_ID = 2 and monitoring_datetime between @DateFrom and @DateTo);
GO
Update DMHDB.dbo.CalcEvap set CalcEvap.B = D.VALUE_NUMERIC
FROM CalcEvap Inner Join Data D
ON CalcEvap.Monitoring_datetime = D.MONITORING_DATETIME and CalcEvap.LEVEL_ID = D.LEVEL_ID
Where D.VARIABLE_ID = 1 and D.MONITORING_DATETIME between @DateFrom and @DateTo;
and three more similar UPDATEs for other Variable_IDs.
The primary key for the Data table is Level_id, Monitoring_datetime, & Variable_id and for the CalcEvap table PK is Level_id & Monitoring_datetime.
What could I do to get it to run faster or find where the bottleneck is? The system is running at 10% CPU disk usage, 30% memory usage, Disk queue length 0.00

ian 29141

SSC Enthusiast

Points: 118
More actions
April 29, 2020 at 1:38 am
Answer
#3747192
Thank you for the guidance, especially Chris. The solution was to include the Index_ID in the first SELECT as shown below. The run time is now 3 seconds on the 27 million rows.
```
INSERT into dmhdb.dbo.CalcEvap (Level_id, Monitoring_datetime, A) 
(Select Level_id, Monitoring_datetime, VALUE_NUMERIC from Data Where VARIABLE_ID = 2 and monitoring_datetime > @DateFrom 
and Level_id in (Select Distinct (Level_ID) from data where variable_id = 9 and monitoring_datetime > @DateFrom));
```
I also dropped the @DateTo as unnecessary. (VARIABLE_ID =2 is used just because it was at the top of the list of all of the variables and VARIABLE_ID = 9 used in the sub-select because it is the rarest parameter.)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101235 More actions · Answer 1

We need the full DDL for all tables involved, including indexes, but for now:

1 How is the dmhdb.dbo.Data clustered?

2 The syntax on your update is not best practice and you may be causing additional join(s) when it runs. Try this instead:

Update CE --<<-- it's CRITICAL when doing an UPDATE with a join to reference the table alias here --<<--

set CE.B = D.VALUE_NUMERIC

FROM DMHDB.dbo.CalcEvap CE

Inner Join Data D

ON CE.Monitoring_datetime = D.MONITORING_DATETIME and CE.LEVEL_ID = D.LEVEL_ID

Where D.VARIABLE_ID = 1 and D.MONITORING_DATETIME between @DateFrom and @DateTo;

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 2

ian 29141 wrote:

...The primary key for the Data table is Level_id, Monitoring_datetime, & Variable_id and for the CalcEvap table PK is Level_id & Monitoring_datetime...

Is that the order of the columns in the indexes? Are there any other indexes besides the primary keys?

One problem I see if this is the order of the columns and there are no other indexes, is the WHERE clauses on the Data table do not use Level_id, so this will always have to do a full table scan to solve "Where VARIABLE_ID = 2 and monitoring_datetime between @DateFrom and @DateTo"

Is it possible to change the order of the columns in the index to be Monitoring_datetime, Variable_id, Level_id?

For any further advice we'd probably have to look at the execution plan of these statements.

Mr. Brian Gale One Orange Chip Points: 25010 More actions · Answer 3

What is your disk I/O?

One thing that can help is to change your "between" to a > and <. Between comparisons have been known to have performance issues.

On top of that, I'd start by looking at your execution plan to determine where the bottleneck is. Also, make sure your statistics are up to date. This doesn't appear to be a stored procedure, but an ad-hoc query; changing it to a stored procedure (especially if it is going to be run multiple times) may offer performance boost, but be careful of parameter sniffing problems.

if you have too many indexes that can slow inserts/updates down, and if you have duplicate or unused indexes, it can slow everything down.

If you can archive some of the data in "data", you will get performance benefits as well as you will have less data to look through. Depending on the setup, you may even get a performance boost by making temp tables/table variables and splitting the data up from "data" by variable ID.

The first step is to figure out which part of the query is "slow" and needs tuning.

Also, looking at that query, the UPDATE will not run; it will error out because @DateFrom and @DateTo are undefined.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

ScottPletcher SSC Guru Points: 101235 More actions · Answer 4

Assuming the PKs are the clustering keys [*], for best performance you'd need to change the order of the keys in the Data table.

If you always (with perhaps only extremely rare exceptions) specify a Variable_ID = <value> when reading the table, then the table should be clustered on:

( Variable_ID, Monitoring_datetime, Level_id )

If you always specify Monitoring_datetime and don't always specify Variable_ID, then the clustering key should be:

( Monitoring_datetime, Variable_ID, Level_id )

For the CalcEvap table, the existing clustering key of ( Level_id, Monitoring_datetime) is ok.

[*] A PK does not have to be the clustering key in SQL Server, although by default it will be if you don't explicitly specify a different clustering key and don't explicitly specify NONCLUSTERED when creating the PK.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

ian 29141 SSC Enthusiast Points: 118 More actions · Answer 5

The order of the columns in the index can not be changed. The PK is clustered and there are indexes for Level_id, Monitoring_datetime, and for variable_ID.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 6

First of all, you define some variables and then do an insert and then you break the batch with a GO. That should mean that when you get to the UPDATE after that, you should be getting errors about how you need to declare the @DateFrom and @DateTo variables in the criteria for that update. You need to fix that first.

After that, you're using local variables in your WHERE clause whose values aren't known at run time. Considering how long it's taking, my next shot would be to use OPTION(RECOMPILE) on the queries and see if that helps.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ian 29141 SSC Enthusiast Points: 118 More actions · Answer 7

The run took just as long when using the CE alias in the update.

Mr. Brian Gale One Orange Chip Points: 25010 More actions · Answer 8

Which is the slow portion of that query? Are ALL of the UPDATEs and INSERTs slow and taking roughly the same time to complete?

Is this a "run-once" query or is this run multiple times?

is this an ad-hoc query or a stored procedure? (thinking ad-hoc as you have a "GO" in your statement).

If you run this query a lot, you may want to put it in a stored procedure which could offer performance benefits. If it is a "run-once" query, optimizing it may not be worth the time and effort.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

ian 29141 wrote:

The run took just as long when using the CE alias in the update.

If you're running what you posted, it's not actually running. Like I said in my previous post, you have a GO batch separator in the code that's killing the followup use of variables.

It would be helpful if you posted the actual code you're using. I don't want to assume, at this point, that merely removing the GO is what you're actually running.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ian 29141 SSC Enthusiast Points: 118 More actions · Answer 10

Thanks for the guidance in cleaning up the SQL. It is attached but I haven't been able to run it because the connection to the remote system is down right now.

ian 29141 SSC Enthusiast Points: 118 More actions · Answer 11

Uploading as txt

Attachments:

You must be logged in to view attached files.

as_1234 SSCrazy Points: 2885 More actions · Answer 12

People tend to be nervous of opening attached files. Can you post it into a code block as you did in your first post?

ian 29141 SSC Enthusiast Points: 118 More actions · Answer 13

USE [DMHDB]
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON

IF EXISTS (SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'CalcEvap')
DROP TABLE CalcEvap;

CREATE TABLE [dbo].[CalcEvap](
[ID] [decimal](38, 0) IDENTITY(1,1) NOT NULL,
[LEVEL_ID] [int] NOT NULL,
[MONITORING_DATETIME] [datetime] NOT NULL,
[EvapVar_ID] int,
--A = Air temperature 2
[A]  Float, 
--B = Air pressure 1
  Float, 
--C = Relative humidity 12
[C]  Float,
--D = Net radiation 9 
[D]  Float, 
--E = Wind speed 18 
[E]  Float, 
--V = Evap 22
[V]  decimal(18,6),
--CV = Evap as characters
[CV] nvarchar(max)
,
 CONSTRAINT [PK_CalcEvap_1] PRIMARY KEY CLUSTERED 
(
[LEVEL_ID] ASC,
[MONITORING_DATETIME] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] 


--  Set the date range
DECLARE @DateFrom AS DATETIME;
Set @DateFrom = (select getdate()-2day)
--Set @DateFrom = '2016-09-26'  --used for local testing
DECLARE @DateTo AS DATETIME;
Set @DateTo = (select getdate()+1day)
--SET @DateTo = '2016-09-29'   --used for local testing

--Load the temp table with all the valid air temperature values 
INSERT into dmhdb.dbo.CalcEvap (Level_id, Monitoring_datetime, A) 
(Select Level_id, Monitoring_datetime, VALUE_NUMERIC from Data Where  VARIABLE_ID = 2 and monitoring_datetime between @DateFrom and @DateTo);

--The CalcEvap table will have only about 4000 rows

-- Now add the air pressure (1) for those dates
Update CE set CE.B = D.VALUE_NUMERIC 
FROM CalcEvap CE Inner Join Data D 
ON CE.Monitoring_datetime = D.MONITORING_DATETIME and CE.LEVEL_ID = D.LEVEL_ID 
Where D.VARIABLE_ID = 1 and D.MONITORING_DATETIME between @DateFrom and @DateTo;

-- Now add the relative humidity (12) for those dates
Update CE set CE.C = D.VALUE_NUMERIC 
FROM CalcEvap CE Inner Join Data D 
ON CE.Monitoring_datetime = D.MONITORING_DATETIME and CE.LEVEL_ID = D.LEVEL_ID 
Where D.VARIABLE_ID = 12 and D.MONITORING_DATETIME between @DateFrom and @DateTo;

-- Now add the Net radiation (9) for those dates
Update CE set CE.D = D.VALUE_NUMERIC 
FROM CalcEvap CE Inner Join Data D 
ON CE.Monitoring_datetime = D.MONITORING_DATETIME and CE.LEVEL_ID = D.LEVEL_ID 
Where D.VARIABLE_ID = 9 and D.MONITORING_DATETIME between @DateFrom and @DateTo ;

-- Now add the wind (18) for those dates
Update CE set CE.E = D.VALUE_NUMERIC 
FROM CalcEvap CE Inner Join Data D 
ON CE.Monitoring_datetime = D.MONITORING_DATETIME and CE.LEVEL_ID = D.LEVEL_ID 
Where D.VARIABLE_ID = 18 and D.MONITORING_DATETIME between @DateFrom and @DateTo;

--Now clean up the table by removing the invalid readings
Delete from CalcEvap where A not between -10 and 50;  --A = Air temperature 2
Delete from CalcEvap where B not between 500 and 1500; --B = Air pressure 1
Delete from CalcEvap where C not between 5 and 110;--C = Relative humidity 12
Delete from CalcEvap where D not between -500 and 2000;--D = Net radiation 9 
Delete from CalcEvap where E not between 0 and 100;--E = Wind speed 18 

Update DMHDB.dbo.CalcEvap set DMHDB.dbo.CalcEvap.V = 
((0.408*((4098*POWER(0.6108,((17.27*A)/(A+237.3))))/(POWER((A+237.3),2)))*((D*600)/1000000))+((0.0000665*B)*(6.25/(A+273))*(E*0.748)*((Power(0.6108,((17.27*A)/(A+237.3))))-((Power(0.6108,((17.27*A)/(A+237.3)))*(C/100)))))) /(((4098*POWER(0.6108,((17.27*A)/(A+237.3))))/(POWER((A+237.3),2)))+((0.0000665*B)*(1+0.34*(E*0.748))))

-- convert to varchar to go into Value field in Data 
Update DMHDB.dbo.CalcEvap set DMHDB.dbo.CalcEvap.CV = V;

--Clear out the rows that didn't calculate due to missing data 
DELETE  DMHDB.dbo.CalcEvap where DMHDB.dbo.CalcEvap.V IS NULL;

-- Set the variableID for Evap (22) (to be used in the delete stateemnt)
UPDATE CalcEvap set EvapVar_ID = 22;

--Make room in Data for the rows
--("DT" is to delete rows by reference)
DELETE DT FROM Data DT INNER JOIN CalcEvap CE 
ON DT.MONITORING_DATETIME = CE.MONITORING_DATETIME AND DT.LEVEL_ID = CE.LEVEL_ID AND DT.VARIABLE_ID = CE.EvapVar_ID; 

--Now put the rows in the temp table into the Data table 
INSERT into Data (Level_id, Monitoring_datetime, Variable_id, value)
(Select Level_ID, Monitoring_datetime,EvapVar_ID, CV from CalcEvap)