EXIST IN JOIN, Results vary....

  • I am running time diffrences in the following MAX inner queries, as you can see the first and last which use an IN clause or a Join clause to find the max entryTIME of a machineinstall returns the same number of rows however even while doing the scan count and the logical reads being the same, the latter (JOIN) is even slower by an additional second... the EXISTS is the fastest one, but the results vary and I wanted to get an outsiders persective if I'm even doing this part correctly...

    thanks for your time.

    --RUNTIME:04

    --(80411 row(s) affected)

    --Table 'tbl_MachineInstalls'. Scan count 2, logical reads 3264, physical reads 0, read-ahead reads 0.

    SELECT MIo.MachInstallID AS CurrentInstallID,

    MIo.ActiveServiceDlrID AS ServicedByID,

    MIo.InstallDate AS CurrentInstallDate,

    MIo.MachineID,

    MIo.entryTIME

    FROM dbo.tbl_MachineInstalls AS MIo WITH (NOLOCK)

    WHERE MIo.entryTIME IN

    (SELECT MAX(MIi.entryTime)

    FROM dbo.tbl_MachineInstalls AS MIi WITH (NOLOCK)

    WHERE MIi.MachineID = MIo.MachineID)

    --RUNTIME:03

    --(81898 row(s) affected)

    --Table 'tbl_MachineInstalls'. Scan count 1, logical reads 1632, physical reads 0, read-ahead reads 0.

    SELECT MIo.MachInstallID AS CurrentInstallID,

    MIo.ActiveServiceDlrID AS ServicedByID,

    MIo.InstallDate AS CurrentInstallDate,

    MIo.MachineID,

    MIo.entryTIME

    FROM dbo.tbl_MachineInstalls AS MIo WITH (NOLOCK)

    WHERE EXISTS

    (SELECT MAX(MIi.entryTime)

    FROM dbo.tbl_MachineInstalls AS MIi WITH (NOLOCK)

    WHERE MIi.MachineID = MIo.MachineID

    AND MIi.entryTime = MIo.entryTime)

    --RUNTIME:05

    --(80411 row(s) affected)

    --Table 'tbl_MachineInstalls'. Scan count 2, logical reads 3264, physical reads 0, read-ahead reads 0.

    SELECT MIo.MachInstallID AS CurrentInstallID,

    MIo.ActiveServiceDlrID AS ServicedByID,

    MIo.InstallDate AS CurrentInstallDate,

    MIo.MachineID,

    MIo.entryTIME

    FROM dbo.tbl_MachineInstalls AS MIo WITH (NOLOCK)

    INNER JOIN (SELECT MachineID, MAX(entryTime) AS entryTIME

    FROM dbo.tbl_MachineInstalls WITH (NOLOCK)

    GROUP BY MachineID) AS MIi

    ON MIi.MachineID = MIo.MachineID AND MIi.entryTIME = MIo.entryTIME

    -- Francisco

  • I forgot to mention that the 81898 results are ALL the records in that table, why would the EXISTS not filter out any data?

    Thanks,

    -- Francisco

  • Your first and last queries are essentially the same, and I doubt the execution plans differ.  When you're dealing with such short times, one second is probably within the margin of error introduced by other factors and you will probably not get these timings repeatedly if you rerun the same queries.  The second query is entirely different and of course will not give you the same result (and will use a different execution plan).  IF EXISTS (SELECT MAX(mi.EntryTime)... is the same as IF EXISTS (SELECT *... as you're not correlating on MAX(EntryTime) but instead on just EntryTime.



    --Jonathan

  • I modified the where clause to show the having, which I missed and posted way to early to this forumn, but maybe there is something that can be done? :S

    --RUNTIME:18

    --(80411 row(s) affected)

    --Table 'Worktable'. Scan count 81888, logical reads 459541, physical reads 0, read-ahead reads 0.

    --Table 'tbl_MachineInstalls'. Scan count 2, logical reads 3264, physical reads 0, read-ahead reads 0.

    SELECT MIo.MachInstallID AS CurrentInstallID,

    MIo.ActiveServiceDlrID AS ServicedByID,

    MIo.InstallDate AS CurrentInstallDate,

    MIo.MachineID,

    MIo.entryTIME

    FROM dbo.tbl_MachineInstalls AS MIo WITH (NOLOCK)

    WHERE EXISTS

    (SELECT MAX(MIi.entryTime)

    FROM dbo.tbl_MachineInstalls AS MIi WITH(NOLOCK)

    WHERE MIi.MachineID = MIo.MachineID

    HAVING MAX(MIi.entryTime) = MIo.entryTIME)

    -- Francisco

  • Jonothan,

    Thanks for your post, and yes, the query plan for the 1st and 3rd query were the same, it's kinda bizzar that the 3rd is still even a second slower than the 1st. I consistently re-ran each set and continued to get the same results each time... and now that I've re-written the EXSTS it's even slower. I was trying to optimize this section because I have a udf that returns a table and when observing it's query plan I thought that perhaps I could optimize this section first.

    the entire UDF runs at about 40 seconds which is not terrible, but is seemingly causing problems on another sproc which it is the basis for, on Friday the same procedure would run in about 8-20 seconds, now today on monday it is taking all of 8 minutes... I'm not sure what could have changed so drasticaly

    -- Francisco

  • You may want to study some more basic SQL, as your new subquery used with EXISTS should of course be just:

    (SELECT *

    FROM dbo.tbl_MachineInstalls AS MIi WITH(NOLOCK)

    WHERE MIi.MachineID = MIo.MachineID AND MAX(MIi.entryTime) = MIo.entryTIME)

    I would not expect this to be faster than the other variants, though.

    Try taking your timings in milliseconds rather than seconds.  I think you'll find the variations in times between the first and third queries as you rerun them will show the margin of error to be large enough for them to have equivalent times. 

    As for your slow procedure, have you tried updating the statistics and recompiling the procedure (and any child procedures)?  Have any indexes been dropped recently?



    --Jonathan

  • Thanks again Jonathan for your help,

    2 quick questions ...

    1) how do I measure in milliseconds, till now i've been getting the time displayed in the status bar in QA.

    2) the modified EXISTS gives me the following error

    An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.

    any thoughts?

    -- Francisco

  • 1)

    DECLARE @t datetime

    SET @t = GETDATE()

    --first query here

    PRINT DATEDIFF(ms,@t,GETDATE())

    SET @t = GETDATE()

    --second query here

    PRINT DATEDIFF(ms,@t,GETDATE())

    2)   I should study some basic SQL, your method is about the only way to make that work.  (But I wouldn't do it.)



    --Jonathan

  • 1) ah, ok, I thought there was a setting in QA to change the output time display.

    2) fair enough... Is my table just set up wrong? the table takes inputs, would I need to modify it w/ a trigger update that if a new install date is added, to automatically move the older date out of the way? The table serves to provide a history of everywhere the machine has been installed at.

    So... the Max Installdate that i'm searching for here serves to help find the LAST location it was installed at...

    -- Francisco

  • Is my table just set up wrong?

    No, I just meant I wouldn't use EXISTS in a case like this.  This is not an unusual type of query, and your other two methods are fine (other than I would use = rather than IN in the first).  In some cases the best performance can be found with TOP 1 rather than MAX(), e.g.:

    SELECT MIo.MachInstallID AS CurrentInstallID,

    MIo.ActiveServiceDlrID AS ServicedByID,

    MIo.InstallDate AS CurrentInstallDate,

    MIo.MachineID,

    MIo.entryTIME

    FROM dbo.tbl_MachineInstalls AS MIo

    WHERE MIo.entryTIME =

    (SELECT TOP 1 MIi.entryTime

     FROM dbo.tbl_MachineInstalls AS MIi

     WHERE MIi.MachineID = MIo.MachineID

     ORDER BY MIi.entryTime DESC)



    --Jonathan

  • Francisco I think the query below will fix your broken exists query so that you can do the benchmark.

    SELECT

    MIo.MachInstallID AS CurrentInstallID,

    MIo.ActiveServiceDlrID AS ServicedByID,

    MIo.InstallDate AS CurrentInstallDate,

    MIo.MachineID,

    MIo.entryTIME

    FROM

    dbo.tbl_MachineInstalls AS MIo WITH (NOLOCK)

    WHERE MIo.entryTime =

    (

     SELECT Max(MIi.entryTime)

     FROM dbo.tbl_MachineInstalls AS MIi WITH (NOLOCK)

     WHERE MIi.MachineID = MIo.MachineID

    )

    Peter Evans (__PETER Peter_)

  • This seems to have a longer run time than the previous ones, I keep yielding an even 8 seconds or 8000ms average run time as compared to even any of the other ones...

    at this point I re-organized the FROM table joins from the most outer query and this speed up the query for 2 of the 4 that were having problems.

    -- Francisco

  • Francisco,

    What is the reasoning behind using the NOLOCK hint? Do you really need it?

     

    BTW: You can have QA give you the elapsed time and cpu time for queries by using something like the following.

    SET STATISTICS TIME ON

    SELECT * FROM sysobjects

    SET STATISTICS TIME OFF

     

    Shows:

    SQL Server Execution Times:

       CPU time = 0 ms,  elapsed time = 0 ms.

    SQL Server parse and compile time:

       CPU time = 0 ms, elapsed time = 0 ms.

    (423 row(s) affected)

    SQL Server Execution Times:

       CPU time = 17 ms,  elapsed time = 17 ms.




    Gary Johnson
    Microsoft Natural Language Group
    DBA, Sr. DB Engineer

    This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.

  • I've found that if I use WITH(NOLOCK), my Selects return that much faster... SQL BOL states

    READUNCOMMITTED

    Specifies that dirty reads are allowed. This means that no shared locks are issued and no exclusive locks are honored. Allowing dirty reads can result in higher concurrency, but at the cost of lower consistency. If READUNCOMMITTED is specified, it is possible to read an uncommitted transaction or to read a set of pages rolled back in the middle of the read; therefore, error messages may result. For more information about isolation levels, see SET TRANSACTION ISOLATION LEVEL.

    So these are dirty reads but that's fine for this process.

    -- Francisco

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply