Deterministic Functions

  • Comments posted to this topic are about the item Deterministic Functions

    Jamie

  • Have to quibble with the answer. Yes, RAND() on its own is non-deterministic, as SS randomly supplies the seed. But, for a single connection, if RAND(seed) has been called then a second RAND() will use the same seed and is therefore deterministic.

    My two bob. 🙂

    Tony..

  • I agree with Tony. If a seed is already specified, all subsequent call to the function RAND() in the same connection will be deterministic (i.e., same number). So that's why I think the answer should be "depends on definition" 😀


    Urbis, an urban transformation company

  • Ditto Ditto - the answer's quoted text is WRONG WRONG WRONG - or at least only partially correct/ very misleading! Gimme my point back! I happen to understand how this function works very well.

    Check ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/363c84d6-b9fa-49ba-9a75-e44f27535ff6.htm "For one connection, if RAND() is called with a specified seed value, all subsequent calls of RAND() produce results based on the seeded RAND() call. For example, the following query will always return the same sequence of numbers."

    So, is a particular RAND() deterministic without using a seed? IT DEPENDS on whether a PREVIOUS RAND() used a seed - so IT DEPENDS on definition of "deterministic"

  • Am in there with the nay-sayers... For a single connection... RANd() behaves deterministically if following a seeded definition. Give us our points!

  • The question is a quote from the Microsoft documentation. but has a second purpose here. See the article written by two Microsoft researchers. http://msdn.microsoft.com/en-us/library/cc441928.aspx

    Based on the fact that RAND is deterministic when used with a seed, they are able to get a speedy sample from huge tables using the NEWID() function in conjunction with the RAND function. If it weren't deterministic, the sample would not work. Attempt it on a table with several million records. The sampling is extremely fast. The algorithm is faster if the keycol1 refers to a clustered ID table.

    SELECT * FROM Table1

    WHERE (ABS(CAST(

    (BINARY_CHECKSUM

    (keycol1, NEWID())) as int))

    % 100) < 10

    Jamie

  • No disagreement with specifying a seed causes a deterministic value

    How about this?

    Set Nocount On

    Create Table #Temp

    (

    Val int

    )

    Insert Into #Temp values (1)

    Insert Into #Temp values (2)

    Insert Into #Temp values (3)

    Insert Into #Temp values (4)

    Insert Into #Temp values (5)

    Set Nocount Off

    Select Rand() From #Temp

    Drop table #Temp

    Should this be "Deterministic"?

    Can I tell which value will be displayed FIRST? - No

    Can I tell which value will be displayed SECOND - Yes, same as first!. To me, this is deterministic.

    Output, for those that didn't try:

    ----------------------

    0.293249400409258

    0.293249400409258

    0.293249400409258

    0.293249400409258

    0.293249400409258

    (5 row(s) affected)

    So seems like "Depends on Definition" to me

  • Try it this way:

    Create Table #Temp

    (

    UniqueColumn UNIQUEIDENTIFIER DEFAULT NEWID(), Val int

    )

    Insert Into #Temp (val) values(1)

    Insert Into #Temp (val) values(2)

    Insert Into #Temp (val) values(3)

    Insert Into #Temp (val) values(4)

    Insert Into #Temp (val) values(5)

    Set Nocount Off

    SELECT * FROM #Temp

    WHERE (ABS(CAST(

    (BINARY_CHECKSUM(*) *

    RAND()) as int)) % 100) < 100

    Drop table #Temp

    Jamie

  • http://en.wikipedia.org/wiki/Random_number_generator

    given that there is still considerable debate as to the "True Randomness" of computer generated random number algorithms it would be reasonable to assume that there are definitions of "Deterministic" that vary (ever so slightly) from Microsoft's. To summarise, whether RAND is deterministic or not "depends on the definition" of deterministic that one is using. Thus logic dictates that the third option (depends on the definition) is correct.

    The points please!

  • Technically speaking RAND() is only pseudo-random, thus it is by definition always deterministic.

    Now, if the question were "Does the RAND() function create a fixed series of numbers only when a seed parameter is specified" the answer would be yes. However, given the current seed value (the current value within RAND() itself) then RAND() is always deterministic, given that RAND uses an algorithm to calculate the result!

    "No" is the truly correct answer, although "depends on definition" obviously is a strong contender. 🙂

    I think this question ran aground on the reefs of vagueness...

  • Ben Leighton (7/28/2008)


    Am in there with the nay-sayers... For a single connection... RANd() behaves deterministically if following a seeded definition. Give us our points!

    Since all subsequent calls to RAND use the same seed parameter that was specified (or randomize) in the first call, then calling RAND() without a parameter still has the seed parameter specified. So the answer is Yes.

  • Hate to say it, but read your own arguments about it not being right. Every single one of them says, "once a seed value has been set".

    The question is correct. Once the seed value is supplied, it's deterministic. Till then, it's not. The quibble is over how the seed value is supplied, maybe, but the question and answer are still correct.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Based purely on the text of the question and the text in the link, the correct answer IS yes.

    Interesting find here, when I run a SELECT RAND() on my servers, I get the exact same result on different boxes with different connections and different versions. It would appear that RAND() does not get a random seed from SQL Server (or maybe something is not setup correctly?) so it would be best to properly seed it every time it is used.

  • From WIKI -

    This causal determinism has a direct relationship with predictability. (Perfect) predictability implies strict determinism, but lack of predictability does not necessarily imply lack of determinism. Limitations on predictability could alternatively be caused by factors such as a lack of information or excessive complexity. An example of this could be found by looking at a bomb dropping from the air. Through mathematics, we can predict the time the bomb will take to reach the ground, and we also know what will happen once the bomb explodes. Any small errors in prediction might arise from our not measuring some factors, such as puffs of wind or variations in air temperature along the bomb's path.

    Take it from another perspective:

    For example, say we observe cooled molecules from an air conditioning unit in a large mall perhaps floating from a large duct into a room. For this environment create an equation (a quadratic: A(i)2 + 2A(i)B(i) + B(i)2 - I'm not an A/C specialist - this is probably not the correct equation) such that the imaginary number mirrors our environment at a specific moment of time where A may be the initial temperature and B, the speed of travel at the vent source. While we cannot mimic the dimension of time, we can, however; using the imaginary number as a mirror, determine what may happen in a fixed environment as a given time in a given space. This allows the A/C designer to create an efficient system in that mall for a given set of variables.

    Thus the point of the RAND function... one grabs a random sample from a "fixed" set of data which will always be unique, but the uniqueness is based on a fixed dataset (try the example from the first post on your largest database and run it a few times). Technically random, but never non-deterministic.

    Jamie

  • I might agree the question is vaguely worded, but arguing that once you supply a seed you don't have to supply one the second time seems nitpicking.

    Also, getting multiple records from the call means that the call is made once, not that it's a deterministic value.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply