Recursive query with different where clauses - very slow

  • I was wondering which is the fastest method of creating a dynamic SQL statement which references the same table recursively but with differing where clauses e.g.

    I wish to query a patient table which attempts to find as many previous patterns for a given blood pressure reading:

    An exmaple pattern is:

    Today: [Open%] BETWEEN -1.61 AND -0.61 [High%] BETWEEN -0.28 AND 0.72 [Low%] BETWEEN -1.61 AND -0.61 [Close%] BETWEEN -1.39 AND -0.39

    Previous Day: [Open%] BETWEEN 1.07 AND 2.07 [High%] BETWEEN 1.29 AND 2.29 [Low%] BETWEEN -0.5 AND 0.5 [Close%] BETWEEN 1.29 AND 2.29)

    Previous Day: [Open%] BETWEEN -0.5 AND 0.5 AND [High%] BETWEEN -0.5 AND 0.5 AND [Low%] BETWEEN -1.16 AND -0.16 AND [Close%] BETWEEN -0.5 AND 0.5

    Previous Day: etc.

    Previous Day: etc.

    Previous Day: etc.

    Previous Day: etc.

    Unfortunately, both T-SQL statements I have used seem very slow i.e.

    Nested Query:

    SELECT [Patient Data].*

    FROM [Patient Data] WHERE [Patient Data].PreviousDayID In

    ( SELECT TodayID FROM [Patient Data] WHERE PreviousDayID In

    ( SELECT TodayID FROM [Patient Data] WHERE PreviousDayID In

    ( SELECT TodayID FROM [Patient Data] WHERE PreviousDayID In

    ( SELECT TodayID FROM [Patient Data] WHERE [Open%] BETWEEN -1.61 AND -0.61 AND [High%] BETWEEN -0.28 AND 0.72 AND [Low%] BETWEEN -1.61 AND -0.61 AND [Close%] BETWEEN -1.39 AND -0.39 ) AND

    [Open%] BETWEEN 1.07 AND 2.07 AND [High%] BETWEEN 1.29 AND 2.29 AND [Low%] BETWEEN -0.5 AND 0.5 AND [Close%] BETWEEN 1.29 AND 2.29) AND

    [Open%] BETWEEN -0.5 AND 0.5 AND [High%] BETWEEN -0.5 AND 0.5 AND [Low%] BETWEEN -1.16 AND -0.16 AND [Close%] BETWEEN -0.5 AND 0.5) AND

    [Open%] BETWEEN -0.5 AND 0.5 AND [High%] BETWEEN -0.06 AND 0.94 AND [Low%] BETWEEN -0.94 AND 0.06 AND [Close%] BETWEEN -0.5 AND 0.5) AND

    [Open%] BETWEEN -0.94 AND 0.06 AND [High%] BETWEEN 0.38 AND 1.38 AND [Low%] BETWEEN -0.94 AND 0.06 AND [Close%] BETWEEN -0.5 AND 0.5 Order By [Date] DESC

    and

    Inner Join Query:

    SELECT [Patient Data].* FROM [Patient Data] INNER JOIN [Patient Data] AS [Patient Data_1] ON [Patient Data].PreMarketID = [Patient Data_1].TodayID INNER JOIN [Patient Data] AS [Patient Data_2] ON [Patient Data_1].PreMarketID = [Patient Data_2].TodayID INNER JOIN [Patient Data] AS [Patient Data_3] ON [Patient Data_2].PreviousDayID = [Patient Data_3].TodayID INNER JOIN [Patient Data] AS [Patient Data_4] ON [Patient Data_3].PreviousDayID = [Patient Data_4].TodayID WHERE [Patient Data].[Close%]>=-0.5 AND [Patient Data].[Close%] =-1.61 AND [Patient Data_4].[Low%]<=-0.61

    I wish to know the most efficient and fastest way of achieving this query e.g. SELECT Statements, indexing, stored procedures, etc.etc.

    Kind regards

    Laurence

  • I don't think this is a terribly difficult problem... but without data or knwing what the table it's stored in looks like, its just not possible. Click on the URL in the last line of my signature below, please...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello Jeff,

    Thank you for your reply and thanks for explaining best practises. Sorry I havent got back to your earlier, but I have been away from work. Please find the following data based upon your "Correct Way to Post Data".

    As I stated earlier my problem is not getting the query to work, but the speed of the query. I have developed a vb6 front end and the intention is to allow to end-user to enter the number of days they wish to scan for a matching pattern and then to loop through all current patients to see if each patient has a matching pattern in the entire table ( currently over 4 millions records ). However, just finding a matching pattern for single patient can take 15 to 20 seconds. I also intended that once one or more patterns have been found that the/a query would then look at the following day results and see if this could forecast the patients blood pressure for the next day.

    I hope this isn't too confusing and that you will let me know if you require any further information.

    Kind regards

    Laurence

    --===== If the test table already exists, drop it

    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[Patient Data]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

    drop table [dbo].[Patient Data]

    GO

    --===== Create the test table with

    CREATE TABLE [dbo].[Patient Data] (

    [TodayID] [int] IDENTITY (1, 1) NOT NULL ,

    [PreviousDayID] [int] NULL ,

    [PatientID] [int] NULL ,

    [Date] [datetime] NULL ,

    [Open%] [float] NULL ,

    [High%] [float] NULL ,

    [Low%] [float] NULL ,

    [Close%] [float] NULL

    ) ON [PRIMARY]

    GO

    --===== Setup any special required conditions especially where dates are concerned

    SET DATEFORMAT DMY

    --===== All Inserts into the IDENTITY column

    SET IDENTITY_INSERT [dbo].[Patient Data] ON

    --===== Insert the test data into the test table

    INSERT INTO [dbo].[Patient Data]

    ([TodayID], [PreviousDayID], [PatientID], [Date], [Open%], [High%], [Low%], [Close%])

    SELECT '4629170','4624881','12','29/02/2008',-1.12,1.12,-1.12,1.12UNION ALL

    SELECT '4624881','4620598','12','28/02/2008',-3.84,-3.84,-3.84,-3.84UNION ALL

    SELECT '4620598','4615809','12','27/02/2008',-1.74,0.6,-2.13,0.6UNION ALL

    SELECT '4615809','4611052','12','26/02/2008',-0.44,1.77,-0.44,1.77UNION ALL

    SELECT '4611052','4606819','12','25/02/2008',0,0.44,0,0.44UNION ALL

    SELECT '4606819','4602038','12','22/02/2008',-1.79,-1.79,-1.79,-1.79UNION ALL

    SELECT '4602038','4597285','12','21/02/2008',-0.39,-0.39,-0.39,-0.39UNION ALL

    SELECT '4597285','4592909','12','20/02/2008',2.09,2.09,-2.13,-2.13UNION ALL

    SELECT '4592909','4588286','12','19/02/2008',3.3,3.3,3.3,3.3UNION ALL

    SELECT '4588286','4583621','12','18/02/2008',-0.02,-0.02,-1.11,-1.11UNION ALL

    SELECT '4583621','4578897','12','15/02/2008',-2.56,-2.56,-2.56,-2.56UNION ALL

    SELECT '4578897','4574160','12','14/02/2008',6.26,6.26,2.65,2.65UNION ALL

    SELECT '4574160','4569529','12','13/02/2008',-4.17,-4.17,-4.17,-4.17UNION ALL

    SELECT '4569529','4564912','12','12/02/2008',-3.93,0.25,-3.93,0.25UNION ALL

    SELECT '4564912','4560435','12','11/02/2008',2.49,6.4,2.49,6.4UNION ALL

    SELECT '4560435','4555994','12','08/02/2008',-1.45,-1.45,-6.64,-6.64UNION ALL

    SELECT '4555994','4551511','12','07/02/2008',-5.88,-5.49,-5.88,-5.49UNION ALL

    SELECT '4551511','4547020','12','06/02/2008',0,2,-0.76,2UNION ALL

    SELECT '4626564','4622254','14','29/02/2008',0.67,0.67,0.67,0.67UNION ALL

    SELECT '4622254','4617575','14','28/02/2008',0.99,1.97,0,1.48UNION ALL

    SELECT '4617575','4612732','14','27/02/2008',0.5,1.93,0.5,0.5UNION ALL

    SELECT '4612732','4608220','14','26/02/2008',0.25,1.17,-0.74,-0.74UNION ALL

    SELECT '4608220','4603782','14','25/02/2008',0.29,0.73,-0.44,-0.44UNION ALL

    SELECT '4603782','4599017','14','22/02/2008',0.25,0.44,-0.25,0.44UNION ALL

    SELECT '4599017','4594539','14','21/02/2008',1.01,2.51,1.01,2.26UNION ALL

    SELECT '4594539','4589944','14','20/02/2008',0.25,0.25,-1.26,0.25UNION ALL

    SELECT '4589944','4585335','14','19/02/2008',-0.88,0.63,-0.88,0.25UNION ALL

    SELECT '4585335','4580641','14','18/02/2008',0,0.76,-0.76,0.76UNION ALL

    SELECT '4580641','4575900','14','15/02/2008',-1.5,-0.05,-1.9,-1.75UNION ALL

    SELECT '4575900','4571181','14','14/02/2008',1.98,2.5,0.25,2.5UNION ALL

    SELECT '4571181','4566589','14','13/02/2008',0.13,1.1,-0.32,-0.32UNION ALL

    SELECT '4566589','4562080','14','12/02/2008',1.84,3.29,1.31,3.29UNION ALL

    SELECT '4562080','4557661','14','11/02/2008',0,0,-2.07,-1.8UNION ALL

    SELECT '4557661','4553190','14','08/02/2008',1.06,2.12,0.53,2.12UNION ALL

    SELECT '4553190','4548718','14','07/02/2008',-1.05,-0.31,-1.05,-1.05UNION ALL

    SELECT '4548718','4544184','14','06/02/2008',-2.08,-0.78,-2.08,-0.78UNION ALL

    SELECT '4544184','4539679','14','05/02/2008',1.61,4.3,1.61,3.49UNION ALL

    SELECT '4539679','4535235','14','04/02/2008',3.01,3.56,1.92,1.92

    --===== Set the identity insert back to normal

    SET IDENTITY_INSERT [dbo].[Patient Data] OFF

  • Thanks. I'll give this a whirl after work, today.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It seems to me that a bit of divide and conquer will make this run much faster... when you're all done, look for PatientID's that have a Parm of 5. Dunno if I put your parameters in the correct order or not...

    SELECT CAST(pd.TodayID AS INT) AS TodayID,pd.PreviousDayID,pd.PatientID,1 AS Parm

    INTO #TmpTable

    FROM [Patient Data] pd

    WHERE pd.[Open%] BETWEEN -1.61 AND -0.61

    AND pd.[High%] BETWEEN -0.28 AND 0.72

    AND pd.[Low%] BETWEEN -1.61 AND -0.61

    AND pd.[Close%] BETWEEN -1.39 AND -0.39

    INSERT INTO #TmpTable

    (TodayID,PreviousDayID,PatientID,Parm)

    SELECT pd.TodayID,pd.PreviousDayID,pd.PatientID,2 AS Parm

    FROM [Patient Data] pd,

    #TmpTable t

    WHERE t.TodayID = pd.PreviousDayID

    AND t.PatientID = pd.PatientID

    AND t.Parm = 1

    AND pd.[Open%] BETWEEN 1.07 AND 2.07

    AND pd.[High%] BETWEEN 1.29 AND 2.29

    AND pd.[Low%] BETWEEN -0.5 AND 0.5

    AND pd.[Close%] BETWEEN 1.29 AND 2.29

    INSERT INTO #TmpTable

    (TodayID,PreviousDayID,PatientID,Parm)

    SELECT pd.TodayID,pd.PreviousDayID,pd.PatientID,3 AS Parm

    FROM [Patient Data] pd,

    #TmpTable t

    WHERE t.TodayID = pd.PreviousDayID

    AND t.PatientID = pd.PatientID

    AND t.Parm = 2

    AND pd.[Open%] BETWEEN -0.5 AND 0.5

    AND pd.[High%] BETWEEN -0.5 AND 0.5

    AND pd.[Low%] BETWEEN -1.16 AND -0.16

    AND pd.[Close%] BETWEEN -0.5 AND 0.5

    INSERT INTO #TmpTable

    (TodayID,PreviousDayID,PatientID,Parm)

    SELECT pd.TodayID,pd.PreviousDayID,pd.PatientID,4 AS Parm

    FROM [Patient Data] pd,

    #TmpTable t

    WHERE t.TodayID = pd.PreviousDayID

    AND t.PatientID = pd.PatientID

    AND t.Parm = 3

    AND pd.[Open%] BETWEEN -0.5 AND 0.5

    AND pd.[High%] BETWEEN -0.06 AND 0.94

    AND pd.[Low%] BETWEEN -0.94 AND 0.06

    AND pd.[Close%] BETWEEN -0.5 AND 0.5

    INSERT INTO #TmpTable

    (TodayID,PreviousDayID,PatientID,Parm)

    SELECT pd.TodayID,pd.PreviousDayID,pd.PatientID,5 AS Parm

    FROM [Patient Data] pd,

    #TmpTable t

    WHERE t.TodayID = pd.PreviousDayID

    AND t.PatientID = pd.PatientID

    AND t.Parm = 4

    AND pd.[Open%] BETWEEN -0.94 AND 0.06

    AND pd.[High%] BETWEEN 0.38 AND 1.38

    AND pd.[Low%] BETWEEN -0.94 AND 0.06

    AND pd.[Close%] BETWEEN -0.5 AND 0.5

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks very much Jeff. I'll give that a go later today !!!

  • Hi Jeff,

    Sorry for being a bit thick, but I ran the query in a View and couldn't find the tmptable it created. I then ran the query in the query analyzer and it was still running 2 minutes later so decided to terminate the execution. Also, ( again me being a bit thick ) I couldn't read in the code where the recursion is happening i.e. looking for a pattern within each patients records that are in a consecutive days.

    Kind regards ( and sorry for being pain ).

    Laurence

  • Not a problem... First, it can't be done in a view because of the temp table.

    I'm not really using any form of recurrsion... I only stuff that meets the first day's measurments. Then I load the second day's measurements but only if the previous day is present, etc, etc. Only patients with the 5 day loaded have all 5 an in the correct order. It should be much faster than having cascaded joins on 4 million rows 5 times.

    I suspect that the only way you'll get any decent performance out of these queries is to build a full blown covering index on you main table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Thanks for your reply. Still not too sure where the tmptable is stored/held? And, when you suggest building a full blown covering index on you main table, how and what would you suggest? Once again, thanks very much for your help and time.

    regards

    Laurence

  • The temp table is built in the first of the 5 queries I posted. Temp tables are session sensitive... they only "live" as long as the current session does. You can make it a permanent table just by removing the # sign from before the name. Of course, then you'll need to make sure you drop it when you're done with it. Look up "temporary tables, Creating and Modifying a Table" in Books Online and see the section on temporary tables for more information on them.

    A "covering index" has all of the columns in the criteria followed by all of the columns in the SELECT list. They can be very fast because all of the information is contained in the index... it never has to go back to the table to find anything.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Cheers !!!

  • Hi Jeff,

    Ran your query and it took 20 seconds as compared to the nearly 2 minutes with the nested query !! I have to say that I’m not au fait with indexing, but in the Indexes/Keys Property Page tab, would the following be a “covering index" for your query of the “Patient Data” table?

    [IX_Market Daily Data_1]

    ColumnNameOrder

    Open% Ascending

    High% Ascending

    Low% Ascending

    Close% Ascending

    [IX_Market Daily Data_2]

    ColumnNameOrder

    TodayID Ascending

    PreviousDayID Ascending

    PatientID Ascending

    Create as Clustered ?? not too sure whether this should be ticked or not.

    Kind regards and thanks very much !!

    Laurence

  • Covering index would contain both sets of columns. Clustered index would make inserts take a huge amount of time on this. Make it a non-clustered index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The order of the columns is important in the covering index.

    Looks to me that the indexes that would help would be:

    For the main table -

    PreviousDayID, PatientID, TodayID,[Open%] ,[High%], [Low%], [Close%]

    You may need to test with PatientID and previousDayID reversed (you might get better performance again).

    Actually - the temp table might use one as well. Whether it helps or hurts might be a matter of testing (it may take longer to build the index than to just pursue without it). I'd think -

    TodayID,PatientID, Parm

    With a fill factor around 70%.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Laurie - it's entirely possible that the DMV recommendation is right. It has access to your statistics (how the data is distributed, etc...), which would make it make that decision, so you may find that its index works better. What I usually do is to put BOTH indexes in play, run the query a few times, and see (from the actual execution plan) which one is actually used. I then drop the one not being used.

    On the temp table - you would create the index immediately after the temp table was created (after the SELECT...INTO). It may slow you down a little on the insert, but it should speed up your joins, so you may not see a gain from having it there. Again - something that might be worth testing if the process is slow.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply