INNER JOIN

  • Gillingham_Dave (3/12/2013)


    Ok I'm new to this stuff but when I ran the code to confirm, it returned 100 rows.

    Select *

    From #Test

    Go

    Row FirstDate LastDate

    2013-03-12 18:27:09.7532013-03-12 18:27:09.7584698

    2013-03-12 18:27:09.7602013-03-12 18:27:09.7594698

    Dave, you simply selected all the rows in the temp table. You had no WHERE clause to filter them nor a JOIN to return only those that matched somehow. As pointed out by many in this discussion, although the odds are low for a match of the first date to the second, it could happen. Disregarding the results wherein Hugo found that the CPU ticks are apparently spaced about .0000027 seconds apart, the odds of a seven-digit datetime2 number matching its truncated to 2 1/3 digit datetime counterpart would seem to be about 33000/1. Small chance, yes, but certainly possible.

    By the way, did you happen to manually type those values in? I ask because the first pair seems so far off that it looks like a 3 and an 8 were exchanged in the third digit to the right of the decimal.

  • --The answer is wrong!. It is rare but rows may come when both dates are equal. Check below query with some additional code

    CREATE DROP TABLE #TEST

    (

    FirstDate DATETIME,

    LastDate DATETIME2

    )

    DECLARE @i int

    SET @i = 100

    WHILE @i > 0

    BEGIN

    INSERT #TEST values (sysdatetime(), sysdatetime())

    SET @i = @i - 1

    END

    SELECT distinct a.FirstDate, b.LastDate

    FROM #TEST a

    INNER JOIN #TEST b

    on a.FirstDate = b.LastDate

    ALTER TABLE #TEST ADD NewDate DATETIME2(3)

    UPDATE #TEST SET NewDate = LastDate

    SELECT distinct a.FirstDate, b.LastDate, b.NewDate

    FROM #TEST a

    INNER JOIN #TEST b

    on a.FirstDate = b.NewDate

    ALTER TABLE #TEST ADD NewDate2 DATETIME2

    UPDATE #TEST SET NewDate2 = NewDate

    SELECT distinct a.FirstDate, b.LastDate, b.NewDate, b.NewDate2

    FROM #TEST a

    INNER JOIN #TEST b

    on a.FirstDate = b.NewDate2

  • I was confused by people saying how rare it was to find matching values. When I tested the original code, I found that it was not infrequent at all to find matching values. Every time I ran the code, there were a few matching items.

    When I run :

    SELECT sysdatetime()

    on my test server, the returned value is:

    2013-03-13 11:23:09.9250000

    and every time I run it, the value has only three non-zero decimal places.

    (I am running this on a test server which is a SQL Server 2012 installation, on a Windows Web Server 2008 R2, with AMD Opteron processor. I'm not sure whether it is the processor or the Windows version which causes this behaviour.)

    On looking in more detail at the table created for the question, on my system, all of the SYSDATETIME values which are placed into the DATETIME2 field end in ...0000 . Which makes sense, given that I only ever get 3 decimal places from SYSDATETIME().

    In many cases, this means that it matches the value placed in the DATETIME field.

    However, in some cases the value placed in the DATETIME field and the value in the DATETIME2 field differ in the third decimal place. Here's a sample of a few rows from a test run:

    FirstDate (DATETIME) LastDate (DATETIME2)

    2013-03-13 11:05:33.6532013-03-13 11:05:33.6530000

    2013-03-13 11:05:33.6532013-03-13 11:05:33.6540000

    2013-03-13 11:05:33.6572013-03-13 11:05:33.6550000

    2013-03-13 11:05:33.6572013-03-13 11:05:33.6550000

    2013-03-13 11:05:33.6572013-03-13 11:05:33.6550000

    The DATETIME field only contains values where the third decimal place is 0,3 or 7. The actual SYSDATETIME value is rounded to fit into this pattern.

    The DATETIME2 field contains the actual SYSDATETIME value to 3 decimal places.

    So in 30% of the rows, on average, those where the SYSDATETIME returns a value whose third decimal place is 0,3 or 7 , the two values match. On the other rows, they don't match.

  • I also wonder whether the precision being returned by the SYSDATETIME function might explain the behaviour that Tom highlighted, where the test script runs much faster on his old computer than another user reports on their Core i5.

    If the older computer is generating SYSDATETIME values with fewer decimal places, then the odds of getting a value that ends in ...0000 and matches the stored DATETIME value will be much better.

    ( e.g. on my test server as noted above, the probability of a match is 0.3 ; but if your computer generates SYSDATETIME values to 7 decimal places, the probability is 0.00003)

  • archie flockhart (3/13/2013)


    I was confused by people saying how rare it was to find matching values. When I tested the original code, I found that it was not infrequent at all to find matching values. Every time I ran the code, there were a few matching items.

    I'm not sure, but I think cpu and other things such as SQL Server version and hard drive speed may be the factors here. My laptop has an i3 dual core with SQL Server 2008 R2, it took about 3 seconds to 5 minutes get get first matched.

    archie flockhart (3/13/2013)


    The DATETIME field only contains values where the third decimal place is 0,3 or 7. The actual SYSDATETIME value is rounded to fit into this pattern.

    Yeah, when I ran a test on getdate() the last digit always end in 0, 3, or 7. Thanks for pointing it out.

    Well in term of matching probability, I think it's not high based on my test result. Using Evgeny's script, I add some stats and identity primary key for the #test table. It took a very high number of rows to find the first match. I notice that when increase the batch size, it tends to reduce total time and reduce #rows tested.

    My test results:

    number_rows_per_batch total_batch_count number_of_tested_rows totalTime

    100 6446 644600 116 seconds

    100 3554 355400 63 seconds

    100 20208 2020800 359 seconds

    1000 1505 1505000 72 seconds

    2000 199 398000 16 seconds

    2000 1928 3856000 154 seconds

    5000 177 885000 32 seconds

    5000 64 320000 12 seconds

    5000 13 65000 3 seconds

    5000 178 890000 31 seconds

    5000 548 2740000 97 seconds

    10000 37 370000 13 seconds

    10000 31 310000 10 seconds

    10000 58 580000 20 seconds

    20000 18 360000 12 seconds

    20000 31 620000 21 seconds

    USE tempdb;

    go

    SET NOCOUNT ON;

    DECLARE @number_rows_per_batch INT = 100; --number rows per batch

    DECLARE @number_of_tested_rows INT; --total rows tested

    DECLARE @total_batch int = 0; --total batch

    DECLARE @startTime DATETIME2 = SYSDATETIME();--start time

    DECLARE @endTime DATETIME2; --time when 1st match found

    WHILE 1=1

    BEGIN

    CREATE TABLE #TEST

    (

    id int identity(1,1) primary key,

    FirstDate DATETIME,

    LastDate DATETIME2

    )

    DECLARE @i INT;

    SET @i = @number_rows_per_batch;

    WHILE @i > 0

    BEGIN

    INSERT #TEST values (sysdatetime(), sysdatetime())

    SET @i = @i - 1

    END

    SELECT distinct a.FirstDate, b.LastDate

    INTO #t

    FROM #TEST a

    INNER JOIN #TEST b

    on a.FirstDate = b.LastDate

    --match found

    IF @@ROWCOUNT>0

    BEGIN

    SET @endTime = SYSDATETIME();

    SELECT * FROM #t; --matched row

    BREAK; --exit

    END

    SET @total_batch = @total_batch + 1;

    DROP TABLE #TEST, #t;

    END

    DROP TABLE #TEST, #t;

    SET @number_of_tested_rows = @total_batch * @number_rows_per_batch;

    --result output

    SELECT @number_rows_per_batch AS number_rows_per_batch, @total_batch AS total_batch_count,

    @number_of_tested_rows AS number_of_tested_rows,

    CAST(DATEDIFF(SECOND, @startTime, @endTime) AS VARCHAR(4)) + ' seconds' AS totalTime;

    --------------------------------------------------------------------------------------
    Hai Ton
    My Db4Breakfast blog.

  • archie flockhart (3/13/2013)


    I also wonder whether the precision being returned by the SYSDATETIME function might explain the behaviour that Tom highlighted, where the test script runs much faster on his old computer than another user reports on their Core i5.

    That looks likely. I checked to see what my system gives me, and the sysdatetime values I get seem to indicate that the resolution is 25 microseconds in my sql server on my aged laptop. I guess that gives me a match probability of 0.0075, quite a bit lower than the 0.3 on your test machine but a lot higher than the 0.00003 implied by a 100 ns resolution.

    Tom

  • 3 days later the QotD is still incorrect - can this be corrected please?

    Best Regards,

    Chris Büttner

  • Poor question (the idea is nice though), but interesting discussion!

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Unfortunatelly thge writer of this question (or rather the answer;>) has failed:

    I've executed the example query three times and two times it returned a row (eg. when the table looked like this: 2013-03-21 22:08:27.0002013-03-21 22:08:27.0000000)

    so the correct answer is NOT the "No rows will be returned" 😐

  • Personally I see DateTime & DateTime2 as different Data Types and therefore shouldn't be compared to each other in the first place.

    One of these should have been converted to the other before the comparison.

  • as easy as yesterday

    thanks

  • It was an easy but good question. 🙂

  • Such a learning discussion from such a simple question at first sight.

Viewing 13 posts - 46 through 57 (of 57 total)

You must be logged in to reply to this topic. Login to reply