Asking for help to understand subtle nuances of randomised query.

  • Please consider this query for generating "Coin Tosses", heads or tails.

    It seems to contain some strange subtleties whose explanations elude me.

    SELECT

    [RandomCoinTosses].[Result]

    FROM

    [sys].[syscolumns] AS [base]

    OUTER APPLY

    (

    SELECT TOP 1 [Result] = [Val] + LEFT( [base].[name] , 0 )

    FROM (

    SELECT TOP 1 [Val] = 'HEADS'

    UNION ALL

    SELECT 'TAILS'

    ) AS [Source]

    ORDER BY NEWID()

    ) AS [RandomCoinTosses]

    1. Every time I run this, I get at least a run of five matching values at the start (not statistically proven, but this appears to be true). Why?

    (e.g Five or more HEADS before I get a TAIL)

    2. Change the OUTER APPLY to CROSS APPLY and all values will be the same. Why?

    3. Remove the "TOP 1" from [Source] and all values will be the same. Why?

    (additional info for point 3 : the TOP 1 can be anywhere in the [Source] query but it must be present somewhere in there)

    4. Despite the behaviour seen in point 1 (above), if I count the number of HEADS and TAILS they seem acceptable.

    If anyone can shed any light on any of this for me, I would really appreciate it.

    Thanks for your time.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I don't feel up to looking into what is going on with your query, but if you just want a random series, this is probably simpler and closer to what you want.

    select

    case

    when convert(bigint,convert(varbinary(7),newid()))%2637394 < 1318697

    then 'TAILS'

    else 'HEADS'

    end as RESULT

    from

    [sys].[syscolumns] as [base]

  • Michael Valentine Jones (5/24/2011)


    I don't feel up to looking into what is going on with your query, but if you just want a random series, this is probably simpler and closer to what you want.

    Thanks for looking Michael.

    I guess I didn't make it clear enough...it is just the specifics of this query construction that I am interested in.

    There is no business case I am trying to solve, nor any need for coin tosses from sql in my life, but I found these "behaviours" while trying to solve a different problem where I needed randomness on every row that wasn't solved just by casting/converting a newid() value.

    I solved that problem and then came up with this simplified example of the concept for the purpose of this discussion.

    So, I am not looking for alternatives - rather any insight that anyone can offer into why I see these behaviours in this query?

    Thanks again.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I don't know whether this would be equivalent to your query, but if I run the slightly altered code below I get blocks of heads (or tails) with the same uniqueidentifier.

    I'm off to the pub now so can't stop and think about it properly, but thought I'd just throw that out there in case it triggers an "Aha!" moment!

    SELECT

    [RandomCoinTosses].[Result]

    ,[RandomCoinTosses].new_id

    FROM

    [sys].[syscolumns] AS [base]

    OUTER APPLY

    (

    SELECT TOP 1 [Result] = [Val] + LEFT( [base].[name] , 0 ), new_id

    FROM (

    SELECT TOP 1 [Val] = 'HEADS' ,newid() as new_id

    UNION --ALL

    SELECT 'TAILS' ,newid()

    ) AS [Source]

    ORDER BY new_id

    ) AS [RandomCoinTosses]

    ------------------------------------------------------------------------
    Bite-sized fiction (with added teeth) [/url]

  • mazzz (5/25/2011)


    I don't know whether this would be equivalent to your query, but if I run the slightly altered code below I get blocks of heads (or tails) with the same uniqueidentifier.

    I'm off to the pub now so can't stop and think about it properly, but thought I'd just throw that out there in case it triggers an "Aha!" moment!

    SELECT

    [RandomCoinTosses].[Result]

    ,[RandomCoinTosses].new_id

    FROM

    [sys].[syscolumns] AS [base]

    OUTER APPLY

    (

    SELECT TOP 1 [Result] = [Val] + LEFT( [base].[name] , 0 ), new_id

    FROM (

    SELECT TOP 1 [Val] = 'HEADS' ,newid() as new_id

    UNION --ALL

    SELECT 'TAILS' ,newid()

    ) AS [Source]

    ORDER BY new_id

    ) AS [RandomCoinTosses]

    Yes, that helps to show what is happening very well!

    How on Earth does that query produce that output?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I'm not sure if this is the case but according to Microsoft, there is no guarentee that nondeterministic scalar functions will executed for every row.

    I submitted a "bug" a while back and was told it was expected behavior:

    http://connect.microsoft.com/SQLServer/feedback/details/327457/updates-a-variable-number-of-rows-on-a-table-varaible-unexpected

    Here is the code that I was running, basically, I wanted to update random rows, but only a certain amount of rows, in this case 3. What happend is that if I ran this code over and over again I'd sometimes get soem number other that 3 rows updated (usually 2 or 4). DECLARE @Foo TABLE (ID INT IDENTITY(1, 1) PRIMARY KEY, Bar INT, Code CHAR(1))

    DECLARE @Count INT

    -- Setup data

    INSERT @Foo

    SELECT 1, NULL

    UNION ALL SELECT 2, NULL

    UNION ALL SELECT 3, NULL

    UNION ALL SELECT 4, NULL

    UNION ALL SELECT 5, NULL

    UNION ALL SELECT 6, NULL

    UNION ALL SELECT 7, NULL

    -- Update 3 random rows

    SET @Count = 3

    -- Update 3 random rows setting the Code value to A

    UPDATE Foo

    SET Code = 'A'

    FROM @Foo AS Foo

    INNER JOIN

    (

    SELECT

    ID,

    ROW_NUMBER() OVER(ORDER BY NEWID()) AS RowNum

    FROM

    @Foo

    ) AS Temp

    ON Foo.ID = Temp.ID

    WHERE

    Temp.RowNum <= @Count

    -- Check updates.

    SELECT *

    FROM @FooUnfortunately, I am no longer seeing this behavior, so I'm not sure it applies in this case...

  • @Lamprey13 - yes that is probably the same thing as this - or at least they must be related.

    So, I need to watch out for this one then.

    I guess I can put it in the cupboard and mark it as "Not good practice".

    Thanks

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I've got a bad feeling this has to do with repeated name in syscolumn and the apply function. I'm not done chewing on this one yet, just haven't had a lot of free time.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (5/25/2011)


    I've got a bad feeling this has to do with repeated name in syscolumn and the apply function. I'm not done chewing on this one yet, just haven't had a lot of free time.

    Thanks Craig, I sure do appreciate your time. Don't worry too much though - as I said, it is idle curiosity driving this, not anything important.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (5/25/2011)


    Craig Farrell (5/25/2011)


    I've got a bad feeling this has to do with repeated name in syscolumn and the apply function. I'm not done chewing on this one yet, just haven't had a lot of free time.

    Thanks Craig, I sure do appreciate your time. Don't worry too much though - as I said, it is idle curiosity driving this, not anything important.

    Same for me, but it's more then an idle curiousity for me. It's annoying the crap out of me in the back of my head. 🙂 That incredibly sideways distribution has me very curious.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (5/25/2011)


    mister.magoo (5/25/2011)


    Craig Farrell (5/25/2011)


    I've got a bad feeling this has to do with repeated name in syscolumn and the apply function. I'm not done chewing on this one yet, just haven't had a lot of free time.

    Thanks Craig, I sure do appreciate your time. Don't worry too much though - as I said, it is idle curiosity driving this, not anything important.

    Same for me, but it's more then an idle curiousity for me. It's annoying the crap out of me in the back of my head. 🙂 That incredibly sideways distribution has me very curious.

    Oh ok, good then - because it confused the crap out of me too - but at least I can assume that it is because of my limited knowledge 😛

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Posted the bug about something really similar (different application but same source issue methinks)

    https://connect.microsoft.com/SQLServer/feedback/details/383641/in-variable-concatenation-and-order-by-filters-results-like-where-condition

    It was closed as "by design" and if it is the same issue then any of the workarounds you currently have are not really workarounds...

  • Ninja's_RGR'us (5/25/2011)


    Posted the bug about something really similar (different application but same source issue methinks)

    https://connect.microsoft.com/SQLServer/feedback/details/383641/in-variable-concatenation-and-order-by-filters-results-like-where-condition

    It was closed as "by design" and if it is the same issue then any of the workarounds you currently have are not really workarounds...

    I don't have workarounds - except doing it completely differently - but want to understand the strange behaviour.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Alright, let's start from scratch and get rid of the curiousities.

    SELECT

    t.n, [RandomCoinTosses].[Result], x

    FROM

    tempdb..Tally as t

    OUTER APPLY

    (SELECT TOP 1 [Result] = [Val], NewID() AS x

    FROM( SELECT [Val] = 'HEADS' UNION ALL SELECT 'TAILS'

    ) AS [Source]

    ORDER BY NEWID()

    ) AS [RandomCoinTosses]

    Simple, easy. Tally to duplicate and nothing in the inner query to do anything odd.

    Results are utterly duplicated. The query is cached and ran for all rows. Answers question 1: Why are they duplicating? Cached, query is not re-run per row as described in BOL.

    Next, we need to force it to rerun:

    SELECT

    t.n, [RandomCoinTosses].[Result], x

    FROM

    tempdb..Tally as t

    OUTER APPLY

    (SELECT TOP 1 [Result] = [Val], NewID() AS x

    FROM(SELECT [Val] = 'HEADS' UNION ALL SELECT 'TAILS'

    ) AS [Source]

    WHERE isnumeric(t.n) = 1

    ORDER BY NEWID()

    ) AS [RandomCoinTosses]

    Ah, good, we can see our NEWID()'s are regenerating properly. We have a reasonable distribution matrix. It's constantly being re-evaluated. The isnumeric was just to force it to do SOMETHING in regards to the external row. The mathmatics will be re-evaluated each time. Hm, but how to prove that?

    SELECT

    t.n, [RandomCoinTosses].[Result], x, y

    FROM

    (SELECT t.n %2 AS n

    FROM tempdb..Tally as t) AS t

    OUTER APPLY

    (SELECT TOP 1 [Result] = [Val], NewID() AS x, NEWID() AS y

    FROM (SELECT [Val] = 'HEADS' UNION ALL SELECT 'TAILS'

    ) AS [Source]

    WHERE isnumeric(t.n) = 1

    ORDER BY NEWID()

    ) AS [RandomCoinTosses]

    Okay, confirmed caching on equivalent inbound values and that it won't re-create. It also shows that 0 repeats and 1 repeats. This isn't a logical failure but bad expectations of the Cross Apply and optimization mechanics occurring under the hood of SQL Server. Excellent, we're getting there.

    Let's get back to the original question, with a minor change. Let's include the base.name in the result sets:

    SELECT

    base.name,

    [RandomCoinTosses].[Result]

    FROM

    [sys].[syscolumns] AS [base]

    OUTER APPLY

    (

    SELECT TOP 1 [Result] = [Val] + LEFT( [base].[name] , 0 )

    FROM (

    SELECT TOP 1 [Val] = 'HEADS'

    UNION ALL

    SELECT 'TAILS'

    ) AS [Source]

    ORDER BY NEWID()

    ) AS [RandomCoinTosses]

    This will conclude the findings above. The patterning you're seeing is due to the cache'ing of the cross/outer apply query against the source input in regards to the changes in the input.

    EDIT: Keeps chewing up my formatting, sorry. Tried twice, it just doesn't like my tabs for some reason, at least not consistently.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig, Thank you - that explains a lot, incredibly well!

    I have taken that on board and now see why it was working like that.

    Now for the *twist*.....

    I just tried it on Denali with interesting results....

    It seems that something major has changed as this query works perfectly now:

    SELECT

    [RandomCoinTosses].[Result]

    ,[RandomCoinTosses].new_id

    FROM

    [sys].[syscolumns] AS [base]

    OUTER APPLY

    (

    SELECT TOP 1 [Result] = [Val] , new_id

    FROM (

    SELECT [Val] = 'HEADS' ,newid() as new_id

    UNION

    SELECT 'TAILS' ,newid()

    ) AS [Source]

    ORDER BY new_id,base.id,base.colid

    ) AS [RandomCoinTosses]

    Notice, that I no longer need the TOP 1 inside the [Source] definition.

    I have tested this same query on SQL 2005 - where it returns 1 row with values, the rest are NULLS.

    I have tested it on SQL 2011 (Denali) using a database that was created on SQL 2000, then transferred to SQL 2008 - but kept in 2000 compatibility mode, then transferred to Denali and set to SQL 2005 compat. mode. - where it returns 1 row with values, the rest are NULLS.

    I have tried setting that old database to Denali compat. mode - still doesn't work.

    But the query works fine on a 100% denali database.

    This raises the question "If this behaviour is by design, why has the design changed and what impact will that have on existing code?"

    Additionally - it still produces the same 1 value for every row using CROSS APPLY.

    Also

    It still produces the same 1 value for every row if I use UNION ALL in the [Source] definition.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

    Viewing 15 posts - 1 through 15 (of 18 total)

    You must be logged in to reply to this topic. Login to reply