February 29, 2008 at 9:56 am
I was wondering which is the fastest method of creating a dynamic SQL statement which references the same table recursively but with differing where clauses e.g.
I wish to query a patient table which attempts to find as many previous patterns for a given blood pressure reading:
An exmaple pattern is:
Today: [Open%] BETWEEN -1.61 AND -0.61 [High%] BETWEEN -0.28 AND 0.72 [Low%] BETWEEN -1.61 AND -0.61 [Close%] BETWEEN -1.39 AND -0.39
Previous Day: [Open%] BETWEEN 1.07 AND 2.07 [High%] BETWEEN 1.29 AND 2.29 [Low%] BETWEEN -0.5 AND 0.5 [Close%] BETWEEN 1.29 AND 2.29)
Previous Day: [Open%] BETWEEN -0.5 AND 0.5 AND [High%] BETWEEN -0.5 AND 0.5 AND [Low%] BETWEEN -1.16 AND -0.16 AND [Close%] BETWEEN -0.5 AND 0.5
Previous Day: etc.
Previous Day: etc.
Previous Day: etc.
Previous Day: etc.
Unfortunately, both T-SQL statements I have used seem very slow i.e.
Nested Query:
SELECT [Patient Data].*
FROM [Patient Data] WHERE [Patient Data].PreviousDayID In
( SELECT TodayID FROM [Patient Data] WHERE PreviousDayID In
( SELECT TodayID FROM [Patient Data] WHERE PreviousDayID In
( SELECT TodayID FROM [Patient Data] WHERE PreviousDayID In
( SELECT TodayID FROM [Patient Data] WHERE [Open%] BETWEEN -1.61 AND -0.61 AND [High%] BETWEEN -0.28 AND 0.72 AND [Low%] BETWEEN -1.61 AND -0.61 AND [Close%] BETWEEN -1.39 AND -0.39 ) AND
[Open%] BETWEEN 1.07 AND 2.07 AND [High%] BETWEEN 1.29 AND 2.29 AND [Low%] BETWEEN -0.5 AND 0.5 AND [Close%] BETWEEN 1.29 AND 2.29) AND
[Open%] BETWEEN -0.5 AND 0.5 AND [High%] BETWEEN -0.5 AND 0.5 AND [Low%] BETWEEN -1.16 AND -0.16 AND [Close%] BETWEEN -0.5 AND 0.5) AND
[Open%] BETWEEN -0.5 AND 0.5 AND [High%] BETWEEN -0.06 AND 0.94 AND [Low%] BETWEEN -0.94 AND 0.06 AND [Close%] BETWEEN -0.5 AND 0.5) AND
[Open%] BETWEEN -0.94 AND 0.06 AND [High%] BETWEEN 0.38 AND 1.38 AND [Low%] BETWEEN -0.94 AND 0.06 AND [Close%] BETWEEN -0.5 AND 0.5 Order By [Date] DESC
and
Inner Join Query:
SELECT [Patient Data].* FROM [Patient Data] INNER JOIN [Patient Data] AS [Patient Data_1] ON [Patient Data].PreMarketID = [Patient Data_1].TodayID INNER JOIN [Patient Data] AS [Patient Data_2] ON [Patient Data_1].PreMarketID = [Patient Data_2].TodayID INNER JOIN [Patient Data] AS [Patient Data_3] ON [Patient Data_2].PreviousDayID = [Patient Data_3].TodayID INNER JOIN [Patient Data] AS [Patient Data_4] ON [Patient Data_3].PreviousDayID = [Patient Data_4].TodayID WHERE [Patient Data].[Close%]>=-0.5 AND [Patient Data].[Close%] =-1.61 AND [Patient Data_4].[Low%]<=-0.61
I wish to know the most efficient and fastest way of achieving this query e.g. SELECT Statements, indexing, stored procedures, etc.etc.
Kind regards
Laurence
February 29, 2008 at 2:00 pm
I don't think this is a terribly difficult problem... but without data or knwing what the table it's stored in looks like, its just not possible. Click on the URL in the last line of my signature below, please...
--Jeff Moden
Change is inevitable... Change for the better is not.
March 3, 2008 at 4:10 am
Hello Jeff,
Thank you for your reply and thanks for explaining best practises. Sorry I havent got back to your earlier, but I have been away from work. Please find the following data based upon your "Correct Way to Post Data".
As I stated earlier my problem is not getting the query to work, but the speed of the query. I have developed a vb6 front end and the intention is to allow to end-user to enter the number of days they wish to scan for a matching pattern and then to loop through all current patients to see if each patient has a matching pattern in the entire table ( currently over 4 millions records ). However, just finding a matching pattern for single patient can take 15 to 20 seconds. I also intended that once one or more patterns have been found that the/a query would then look at the following day results and see if this could forecast the patients blood pressure for the next day.
I hope this isn't too confusing and that you will let me know if you require any further information.
Kind regards
Laurence
--===== If the test table already exists, drop it
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[Patient Data]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[Patient Data]
GO
--===== Create the test table with
CREATE TABLE [dbo].[Patient Data] (
[TodayID] [int] IDENTITY (1, 1) NOT NULL ,
[PreviousDayID] [int] NULL ,
[PatientID] [int] NULL ,
[Date] [datetime] NULL ,
[Open%] [float] NULL ,
[High%] [float] NULL ,
[Low%] [float] NULL ,
[Close%] [float] NULL
) ON [PRIMARY]
GO
--===== Setup any special required conditions especially where dates are concerned
SET DATEFORMAT DMY
--===== All Inserts into the IDENTITY column
SET IDENTITY_INSERT [dbo].[Patient Data] ON
--===== Insert the test data into the test table
INSERT INTO [dbo].[Patient Data]
([TodayID], [PreviousDayID], [PatientID], [Date], [Open%], [High%], [Low%], [Close%])
SELECT '4629170','4624881','12','29/02/2008',-1.12,1.12,-1.12,1.12UNION ALL
SELECT '4624881','4620598','12','28/02/2008',-3.84,-3.84,-3.84,-3.84UNION ALL
SELECT '4620598','4615809','12','27/02/2008',-1.74,0.6,-2.13,0.6UNION ALL
SELECT '4615809','4611052','12','26/02/2008',-0.44,1.77,-0.44,1.77UNION ALL
SELECT '4611052','4606819','12','25/02/2008',0,0.44,0,0.44UNION ALL
SELECT '4606819','4602038','12','22/02/2008',-1.79,-1.79,-1.79,-1.79UNION ALL
SELECT '4602038','4597285','12','21/02/2008',-0.39,-0.39,-0.39,-0.39UNION ALL
SELECT '4597285','4592909','12','20/02/2008',2.09,2.09,-2.13,-2.13UNION ALL
SELECT '4592909','4588286','12','19/02/2008',3.3,3.3,3.3,3.3UNION ALL
SELECT '4588286','4583621','12','18/02/2008',-0.02,-0.02,-1.11,-1.11UNION ALL
SELECT '4583621','4578897','12','15/02/2008',-2.56,-2.56,-2.56,-2.56UNION ALL
SELECT '4578897','4574160','12','14/02/2008',6.26,6.26,2.65,2.65UNION ALL
SELECT '4574160','4569529','12','13/02/2008',-4.17,-4.17,-4.17,-4.17UNION ALL
SELECT '4569529','4564912','12','12/02/2008',-3.93,0.25,-3.93,0.25UNION ALL
SELECT '4564912','4560435','12','11/02/2008',2.49,6.4,2.49,6.4UNION ALL
SELECT '4560435','4555994','12','08/02/2008',-1.45,-1.45,-6.64,-6.64UNION ALL
SELECT '4555994','4551511','12','07/02/2008',-5.88,-5.49,-5.88,-5.49UNION ALL
SELECT '4551511','4547020','12','06/02/2008',0,2,-0.76,2UNION ALL
SELECT '4626564','4622254','14','29/02/2008',0.67,0.67,0.67,0.67UNION ALL
SELECT '4622254','4617575','14','28/02/2008',0.99,1.97,0,1.48UNION ALL
SELECT '4617575','4612732','14','27/02/2008',0.5,1.93,0.5,0.5UNION ALL
SELECT '4612732','4608220','14','26/02/2008',0.25,1.17,-0.74,-0.74UNION ALL
SELECT '4608220','4603782','14','25/02/2008',0.29,0.73,-0.44,-0.44UNION ALL
SELECT '4603782','4599017','14','22/02/2008',0.25,0.44,-0.25,0.44UNION ALL
SELECT '4599017','4594539','14','21/02/2008',1.01,2.51,1.01,2.26UNION ALL
SELECT '4594539','4589944','14','20/02/2008',0.25,0.25,-1.26,0.25UNION ALL
SELECT '4589944','4585335','14','19/02/2008',-0.88,0.63,-0.88,0.25UNION ALL
SELECT '4585335','4580641','14','18/02/2008',0,0.76,-0.76,0.76UNION ALL
SELECT '4580641','4575900','14','15/02/2008',-1.5,-0.05,-1.9,-1.75UNION ALL
SELECT '4575900','4571181','14','14/02/2008',1.98,2.5,0.25,2.5UNION ALL
SELECT '4571181','4566589','14','13/02/2008',0.13,1.1,-0.32,-0.32UNION ALL
SELECT '4566589','4562080','14','12/02/2008',1.84,3.29,1.31,3.29UNION ALL
SELECT '4562080','4557661','14','11/02/2008',0,0,-2.07,-1.8UNION ALL
SELECT '4557661','4553190','14','08/02/2008',1.06,2.12,0.53,2.12UNION ALL
SELECT '4553190','4548718','14','07/02/2008',-1.05,-0.31,-1.05,-1.05UNION ALL
SELECT '4548718','4544184','14','06/02/2008',-2.08,-0.78,-2.08,-0.78UNION ALL
SELECT '4544184','4539679','14','05/02/2008',1.61,4.3,1.61,3.49UNION ALL
SELECT '4539679','4535235','14','04/02/2008',3.01,3.56,1.92,1.92
--===== Set the identity insert back to normal
SET IDENTITY_INSERT [dbo].[Patient Data] OFF
March 3, 2008 at 6:57 am
Thanks. I'll give this a whirl after work, today.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 3, 2008 at 5:21 pm
It seems to me that a bit of divide and conquer will make this run much faster... when you're all done, look for PatientID's that have a Parm of 5. Dunno if I put your parameters in the correct order or not...
SELECT CAST(pd.TodayID AS INT) AS TodayID,pd.PreviousDayID,pd.PatientID,1 AS Parm
INTO #TmpTable
FROM [Patient Data] pd
WHERE pd.[Open%] BETWEEN -1.61 AND -0.61
AND pd.[High%] BETWEEN -0.28 AND 0.72
AND pd.[Low%] BETWEEN -1.61 AND -0.61
AND pd.[Close%] BETWEEN -1.39 AND -0.39
INSERT INTO #TmpTable
(TodayID,PreviousDayID,PatientID,Parm)
SELECT pd.TodayID,pd.PreviousDayID,pd.PatientID,2 AS Parm
FROM [Patient Data] pd,
#TmpTable t
WHERE t.TodayID = pd.PreviousDayID
AND t.PatientID = pd.PatientID
AND t.Parm = 1
AND pd.[Open%] BETWEEN 1.07 AND 2.07
AND pd.[High%] BETWEEN 1.29 AND 2.29
AND pd.[Low%] BETWEEN -0.5 AND 0.5
AND pd.[Close%] BETWEEN 1.29 AND 2.29
INSERT INTO #TmpTable
(TodayID,PreviousDayID,PatientID,Parm)
SELECT pd.TodayID,pd.PreviousDayID,pd.PatientID,3 AS Parm
FROM [Patient Data] pd,
#TmpTable t
WHERE t.TodayID = pd.PreviousDayID
AND t.PatientID = pd.PatientID
AND t.Parm = 2
AND pd.[Open%] BETWEEN -0.5 AND 0.5
AND pd.[High%] BETWEEN -0.5 AND 0.5
AND pd.[Low%] BETWEEN -1.16 AND -0.16
AND pd.[Close%] BETWEEN -0.5 AND 0.5
INSERT INTO #TmpTable
(TodayID,PreviousDayID,PatientID,Parm)
SELECT pd.TodayID,pd.PreviousDayID,pd.PatientID,4 AS Parm
FROM [Patient Data] pd,
#TmpTable t
WHERE t.TodayID = pd.PreviousDayID
AND t.PatientID = pd.PatientID
AND t.Parm = 3
AND pd.[Open%] BETWEEN -0.5 AND 0.5
AND pd.[High%] BETWEEN -0.06 AND 0.94
AND pd.[Low%] BETWEEN -0.94 AND 0.06
AND pd.[Close%] BETWEEN -0.5 AND 0.5
INSERT INTO #TmpTable
(TodayID,PreviousDayID,PatientID,Parm)
SELECT pd.TodayID,pd.PreviousDayID,pd.PatientID,5 AS Parm
FROM [Patient Data] pd,
#TmpTable t
WHERE t.TodayID = pd.PreviousDayID
AND t.PatientID = pd.PatientID
AND t.Parm = 4
AND pd.[Open%] BETWEEN -0.94 AND 0.06
AND pd.[High%] BETWEEN 0.38 AND 1.38
AND pd.[Low%] BETWEEN -0.94 AND 0.06
AND pd.[Close%] BETWEEN -0.5 AND 0.5
--Jeff Moden
Change is inevitable... Change for the better is not.
March 4, 2008 at 2:29 am
Thanks very much Jeff. I'll give that a go later today !!!
March 4, 2008 at 3:45 am
Hi Jeff,
Sorry for being a bit thick, but I ran the query in a View and couldn't find the tmptable it created. I then ran the query in the query analyzer and it was still running 2 minutes later so decided to terminate the execution. Also, ( again me being a bit thick ) I couldn't read in the code where the recursion is happening i.e. looking for a pattern within each patients records that are in a consecutive days.
Kind regards ( and sorry for being pain ).
Laurence
March 4, 2008 at 6:18 am
Not a problem... First, it can't be done in a view because of the temp table.
I'm not really using any form of recurrsion... I only stuff that meets the first day's measurments. Then I load the second day's measurements but only if the previous day is present, etc, etc. Only patients with the 5 day loaded have all 5 an in the correct order. It should be much faster than having cascaded joins on 4 million rows 5 times.
I suspect that the only way you'll get any decent performance out of these queries is to build a full blown covering index on you main table.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 4, 2008 at 7:33 am
Hi Jeff,
Thanks for your reply. Still not too sure where the tmptable is stored/held? And, when you suggest building a full blown covering index on you main table, how and what would you suggest? Once again, thanks very much for your help and time.
regards
Laurence
March 4, 2008 at 7:41 am
The temp table is built in the first of the 5 queries I posted. Temp tables are session sensitive... they only "live" as long as the current session does. You can make it a permanent table just by removing the # sign from before the name. Of course, then you'll need to make sure you drop it when you're done with it. Look up "temporary tables, Creating and Modifying a Table" in Books Online and see the section on temporary tables for more information on them.
A "covering index" has all of the columns in the criteria followed by all of the columns in the SELECT list. They can be very fast because all of the information is contained in the index... it never has to go back to the table to find anything.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 4, 2008 at 8:00 am
Cheers !!!
March 4, 2008 at 9:51 am
Hi Jeff,
Ran your query and it took 20 seconds as compared to the nearly 2 minutes with the nested query !! I have to say that I’m not au fait with indexing, but in the Indexes/Keys Property Page tab, would the following be a “covering index" for your query of the “Patient Data” table?
[IX_Market Daily Data_1]
ColumnNameOrder
Open% Ascending
High% Ascending
Low% Ascending
Close% Ascending
[IX_Market Daily Data_2]
ColumnNameOrder
TodayID Ascending
PreviousDayID Ascending
PatientID Ascending
Create as Clustered ?? not too sure whether this should be ticked or not.
Kind regards and thanks very much !!
Laurence
March 4, 2008 at 10:12 am
Covering index would contain both sets of columns. Clustered index would make inserts take a huge amount of time on this. Make it a non-clustered index.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 4, 2008 at 10:23 am
The order of the columns is important in the covering index.
Looks to me that the indexes that would help would be:
For the main table -
PreviousDayID, PatientID, TodayID,[Open%] ,[High%], [Low%], [Close%]
You may need to test with PatientID and previousDayID reversed (you might get better performance again).
Actually - the temp table might use one as well. Whether it helps or hurts might be a matter of testing (it may take longer to build the index than to just pursue without it). I'd think -
TodayID,PatientID, Parm
With a fill factor around 70%.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 6, 2008 at 6:56 am
Laurie - it's entirely possible that the DMV recommendation is right. It has access to your statistics (how the data is distributed, etc...), which would make it make that decision, so you may find that its index works better. What I usually do is to put BOTH indexes in play, run the query a few times, and see (from the actual execution plan) which one is actually used. I then drop the one not being used.
On the temp table - you would create the index immediately after the temp table was created (after the SELECT...INTO). It may slow you down a little on the insert, but it should speed up your joins, so you may not see a gain from having it there. Again - something that might be worth testing if the process is slow.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply