Tally Tables

  • Hugo Kornelis (9/3/2013)


    sknox (9/3/2013)


    Wow.

    Nobody's mentioned what I consider to be the biggest problem with this question yet.

    You must have overlooked the things already mentioned about ordering by Rune Irvin, Terreador, and me.

    Nope. The things you mentioned, while important, are implementation and performance related. My issue was with the basic concept of Tally Tables.

    Get your basic concepts wrong, and it doesn't matter a bit how good you are at implementation and performance tuning.

  • Hugo Kornelis (9/3/2013)


    L' Eomot Inversé (9/3/2013)


    It could be argued that to be a table in a relational database a table has to have a primary key; that means there is an intrinsic order determined by the key attributes

    In the theoretic approach, a relational table does have to have a key. (Not sure if "primary key" is a requirement; "candidate key" is).

    But a key does not impose an order.

    Codd's original paper talked about multiple primary keys one of which was chosen to be called "the primary key" instead of just "a primary key"; the requirement for the primary key was a definite requirement - anything called a (as opposed to the) primary key was not required; terminology has moved on, wand what was called a primary key is now called a candidate key, and primary key is reserved for the chosen key that must exist.

    A key does not even require an index - that's just an implementation choice. (And a very smart one, I should add).

    And even an index does not impose order. Yes, the B-tree indexes we know in SQL Server do. But hash indexes (available in Oracle, and also in the "Hekaton" engine that you can already play with in the CTP of SQL 2014) don't, and neither do columnstore indexes (available as nonclustered in SQL2012, and as both clustered or nonclustered in SQL Server PDW and in the CTP of SQL 2014).

    A hash index certainly implies an order (a partial order, not necessarily a total order). I haven't looked hard enough at column store indexes, but it is very hard to believe that an index that doesn't imply an order can imply clustering: the mathematical demonstration would go along the lines of clustering implies a measure of closeness, which in turn implies a partial order, so anything that determines clustering determines an order - I haven't worked it through in detail, so that could be wrong, but it looks to me as if it should be pretty easy to demonstrate, so the usability of columnstore indexes for clustering seems to indicate that such an index does imply a order.

    But all that's just pedantry, those are not the sort of indexes or the sort or orders that I was thinking of, so if SQL Server uses a hash index or a column store index to support the primary key my argument that a Tally table (when it's a base table) has an implicit (total) order fails.

    Tom

  • Toreador (9/3/2013)


    Don't worry, he can't spell my nickname properly either 😉

    For some reason he never gets mine wrong :-D.

    If my nickname were Toreador I think I would change it to something harmless like Terreador. Just look up the verb Torear in a decent Spanish dictionary: meanings include: to set the bulls on the cows; to deceitfully and for gain raise someone's hopes; to make someone the butt or ridicule; to cause problems for or annoy someone by repeatedly distracting there attention to various things; to skilfully carry on an embarrassing business; and that's just in Spain - in South America it can also mean to behave like a dog that barks perpetually; to provoke; to irritate; to incite. And of course Toreador is someone who does one or more of those things. Oh, the verb can mean to fight a bull on horseback (that's you on horseback, not the bull) or, less likely, to fight a bull on foot, but the person who is called a toreador in Bizet's Carmen (a bullfighter who kills bulls with a sword) would be called a matador, or a torero, or an espada in Spain and Latin America. A terreador is by comparison fairly harmless; terrear simply means to discover of allow to be seen some land which has been sown with seed (doesn't matter whether the seed has sprouted or not) so there's no mistaking a terreador for someone undesirable.

    Tom

  • sknox (9/3/2013)


    Hugo Kornelis (9/3/2013)


    sknox (9/3/2013)


    Wow.

    Nobody's mentioned what I consider to be the biggest problem with this question yet.

    You must have overlooked the things already mentioned about ordering by Rune Irvin, Terreador, and me.

    Nope. The things you mentioned, while important, are implementation and performance related. My issue was with the basic concept of Tally Tables.

    Get your basic concepts wrong, and it doesn't matter a bit how good you are at implementation and performance tuning.

    Awww, man! Conclusion: I've messed up the interpretation of your message, *and* the correct spelling of two names. Three silly mistakes in one single-line message. Surely, this has to be a new record? :hehe:


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • L' Eomot Inversé (9/3/2013)


    I haven't looked hard enough at column store indexes, but it is very hard to believe that an index that doesn't imply an order can imply clustering

    I'm pretty sure that the terms "clustered" and "nonclustered" in the context of a columnstore index are completely devoid of the original meaning of the term and only refer to an association that all SQL Server people should have: clustered = the single place where all table data is stored; nonclustered = any index that does not necessarily hold all the table data, and that is used with the assumption that the full table data is stored elsewhere, either in the clustered index or as a heap.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • L' Eomot Inversé (9/3/2013)


    Nice clear and easy question (assuming the last option "no difference" means "no difference in what they the language specification says they return" - if it means what it actually says then of course none of the options is correct). Absolutely terrible explanation, though!

    Tom - You're absolutely right about the answer being terrible. Who is the cretin that posted such a silly question anyway?

    Oh, wait! That would be me. 🙂 Guess that explains why I got the answer right.

    Truth is, I struggled to come up with a good explanation of the behavior. But I'd say that Hugo seems to have done an awesome job of that. So thanks Hugo!

    I probably won't have a chance to thank everyone personally for posting to the discussion as I'm kind of swamped at the moment catching up from being on vacation for a week. But let me offer a collective thanks to all that took an interest in the question.

    I must confess, I didn't expect the level of interest, as demonstrated by dissection of the execution plans, by the big boys!

    For the record, I would never use either construct for a Tally table. I was just playing around when I hit upon it.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • sknox (9/3/2013)


    Wow.

    Nobody's mentioned what I consider to be the biggest problem with this question yet.

    I'm talking about the following two answer choices:

    The first query returns a Tally table and the second returns the same row count and numbers but in a randomized order.

    The second query returns a Tally table and the first returns the same row count and numbers but in a randomized order.

    See the problem? They imply that a tally table has to be intrinsically ordered -- this is simply untrue. A tally table simply has to be orderable.

    Actually I didn't mean to imply that Tally tables are intrinsically ordered, but I can see why you might be thinking that I did. Blame it on the vagaries of the English language.

    Likewise could be said about the question itself. I meant to be asking what is different about the result set of the two queries, without anticipating the critical (and excellent) dissection of the resulting query plans.

    As Hugo seems to have done, the intention of those two answers was to allow the reader to dismiss them as incorrect on the basis that row ordering is never guaranteed without an ORDER BY clause. Assuming that you did not fall for the possiblity of the first query generating an error (because ORDER BY 1 does), then it would lead you to select the last answer.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Rune Bivrin (9/3/2013)


    Just look at the execution plan and tell me if there is no difference.

    Or even try this, where NEWID() is completely eliminated from the second query.

    -- 1

    SELECT ROW_NUMBER() OVER (ORDER BY NEWID())

    FROM sys.all_columns

    -- 2

    SELECT ROW_NUMBER() OVER (ORDER BY (select 1))

    FROM sys.all_columns

    I'd argue that they are different but yield the same result, as is the example given in the question. The first query will indeed evaluate NEWID() once for every row whereas the second query does a single call to NEWID(), which lets SQL Server eliminate a sort and thus generates a more efficient query plan.

    Also, strictly speaking the will both return the result sets in random order since there is no ORDER BY specified for the result set. The fact that due to implementation they will render sorted output is not to be trusted if it's important.

    They don't yield the same result set using 2k8. The first query generates a different result set each time it's run. The second query (select 1) returns the same result set each time it's run.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (9/4/2013)


    They don't yield the same result set using 2k8. The first query generates a different result set each time it's run.

    I get the same result set every time, on 2k5, 2k8, and 2k8r2. Which is what I'd expect.

  • Toreador (9/4/2013)


    ChrisM@Work (9/4/2013)


    They don't yield the same result set using 2k8. The first query generates a different result set each time it's run.

    I get the same result set every time, on 2k5, 2k8, and 2k8r2. Which is what I'd expect.

    Disambiguation:

    SELECT @@VERSION

    -- Microsoft SQL Server 2008 R2 (SP2) - 10.50.4000.0 (X64) Jun 28 2012 08:36:30

    -- Copyright (c) Microsoft Corporation

    -- Standard Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)

    -- 1

    SELECT name, ROW_NUMBER() OVER (ORDER BY NEWID())

    FROM sys.all_columns

    -- appears to be different result set every time

    -- 2

    SELECT name, ROW_NUMBER() OVER (ORDER BY (SELECT NEWID()))

    FROM sys.all_columns

    -- appears to be different result set every time

    -- 3

    SELECT name, ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM sys.all_columns

    -- appears to be same result set every time

    -- 4

    SELECT RANK() OVER (ORDER BY NEWID())

    FROM sys.all_columns;

    -- 5337 rows, 1 - 5337

    -- 5

    SELECT RANK() OVER (ORDER BY (SELECT NEWID()))

    FROM sys.all_columns;

    -- 5337 rows, 1 - 5337

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hugo Kornelis (9/3/2013)


    ...- In the first plan, the NEWID() function is evaluated for each row. The results are then sorted, so that row numbers can be returned.

    - In the second plan, the (SELECT NEWID()) is a non-correlated subquery. A non-correlated subquery does not have to be evaluated for each row... It does not HAVE to....

    Quite unexpected to see a non-correlated subquery containing NEWID() work once per row, when we're more accustomed to jumping through hoops to force a correlated subquery containing NEWID() work once per row (or rather, once per unique correlated value).

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • sknox (9/3/2013)


    ]

    See the problem? They imply that a tally table has to be intrinsically ordered -- this is simply untrue. A tally table simply has to be orderable.

    So even if one of the two queries returned its values in a randomized order, it would still be a tally table.

    Not quite right, in my humble opinion. First and semantically speaking, using ROW_NUMBER() isn't the same as an actually Tally Table. It's a result set. Second, if you don't think the "physical" or "intrinsic" order of the Tally Table is important, please see the execution plan and statistics for the following code. Although I'll agree that a split extent may cause the physical order to not be pure for a Tally Table, you'd better do your darnedest to make sure that you've made the physical order of the leaf level of the Tally Table as pure as possible or you'll lose all of the performance advantages of using the Tally Table.

    In the final query below, the missing "Sort" operator is because the output of the Clustered Index operator in the properties has "Ordered = True". I'm not suggesting that's a guaranteed "Sort" so you still need the ORDER BY if the order of N is important, but it does show that the physical order is important to the logical sorting for reasons of performance.

    --===== Create a "Tally" table as a randomized heap

    WITH

    cteGenerateSequence AS

    (

    SELECT TOP (1000000)

    N = ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    )

    SELECT N = ISNULL(N,0) --ISNULL used to make a NOT NULL column

    INTO #RandomizedTally

    FROM cteGenerateSequence

    ORDER BY NEWID()

    ;

    --===== Create a "Tally" table as "ordered" Clustered Table

    SELECT TOP (1000000)

    N = ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT 1)),0) --ISNULL used to make a NOT NULL column

    INTO #OrderedTally

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    ;

    ALTER TABLE #OrderedTally

    ADD PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    ;

    --===== Compare the performance of the two.

    -- Notice the missing "Sort" in the execution plan of

    -- the "ordered" Clustered Table even though there's

    -- an ORDER BY. Although both will get the job done,

    -- it DOES matter whether or not the Tally Table is

    -- just "orderable" or actually "ordered".

    DECLARE @Bitbucket INT;

    SET STATISTICS TIME,IO ON;

    SELECT @Bitbucket = N

    FROM #RandomizedTally

    WHERE N <= 8000

    ORDER BY N

    ;

    SELECT @Bitbucket = N

    FROM #OrderedTally

    WHERE N <= 8000

    ORDER BY N

    ;

    SET STATISTICS TIME,IO OFF;

    Just in case you're wondering about the use of the word "physical" here... from the CREATE INDEX entry of BOL...

    CLUSTERED

    Creates an index in which the logical order of the key values determines the physical order of the corresponding rows in a table.

    Yes, I agree that the physical order can be disturbed and even set across mixed extents but the logical order is usually pretty close to the physical order of a clustered index when first built/rebuilt. The reason why the code takes so long to run on the randomized Tally Table is because the data is not in either physical or logical order.

    To wit, order IS important in a table... it just not guaranteed. Of course, you already knew that or you wouldn't ever rebuild or reorganize a table.

    As for code that builds a "Tally" result set on the fly, you can sometimes actually get a performance gain by defeating parallelism with OPTION (MAXDOP 1). Operations that you want to run in a serial fashion, usually using such a sequence as ROW_NUMBER() provides, like single CPU operations a lot kind of like a hard disk likes a freshly rebuilt clustered index scan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff! 🙂

    Jeff Moden (9/6/2013)


    First and semantically speaking, using ROW_NUMBER() isn't the same as an actually Tally Table.

    I completely agree with this....

    Second, if you don't think the "physical" or "intrinsic" order of the Tally Table is important, please see the execution plan and statistics for the following code.

    ... but not with this.

    What your code proves is that the presence or absence of an index makes a big difference (at least for this query - it helps other queries too, but that's not proven by your code).

    Counter proof 1: Add an index to the #RandomizedTally as well. To keep the table itself scattered, this has to be a NONCLUSTERED index.

    ALTER TABLE #RandomizedTally

    ADD PRIMARY KEY NONCLUSTERED (N) WITH FILLFACTOR = 100;

    After this change, the #RandomizedTally is still slightly more expensive, due to nonclustered indexes having more overhead than clustered indexes - normally more than compensated by the other columns in the clustered index, but in this case there are no other columns. And the only column that is there is so small that the extra overhead is a large percentage.

    Counter proof 2: Have no indexes on either table. The code to generate #RandomizedTally can be unchanged (without the nonclustered index/primary key used above). The code to generate #OrderedTally changes like this:

    SELECT TOP (1000000)

    N = ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT 1)),0) --ISNULL used to make a NOT NULL column

    INTO #OrderedTally

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    OPTION (MAXDOP 1);

    SELECT TOP(100) * FROM #OrderedTally; -- "Ordering" not guaranteed, but worked on my system

    SELECT TOP(100) * FROM #RandomizedTally;

    (The MAXDOP hint is because on my system, paralellism messes up the order. The SELECT is to check that even though the table is a heap, the physical order happens to be ordered. The code I used does not guarantee that, but it did work for me).

    With this change, the two SELECT statements have the exact same performance stats.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • NEWID() is a non-deterministic function 🙂

  • Hugo Kornelis (9/6/2013)


    Hi Jeff! 🙂

    Jeff Moden (9/6/2013)


    First and semantically speaking, using ROW_NUMBER() isn't the same as an actually Tally Table.

    I completely agree with this....

    Second, if you don't think the "physical" or "intrinsic" order of the Tally Table is important, please see the execution plan and statistics for the following code.

    ... but not with this.

    What your code proves is that the presence or absence of an index makes a big difference (at least for this query - it helps other queries too, but that's not proven by your code).

    Counter proof 1: Add an index to the #RandomizedTally as well. To keep the table itself scattered, this has to be a NONCLUSTERED index.

    ALTER TABLE #RandomizedTally

    ADD PRIMARY KEY NONCLUSTERED (N) WITH FILLFACTOR = 100;

    After this change, the #RandomizedTally is still slightly more expensive, due to nonclustered indexes having more overhead than clustered indexes - normally more than compensated by the other columns in the clustered index, but in this case there are no other columns. And the only column that is there is so small that the extra overhead is a large percentage.

    Counter proof 2: Have no indexes on either table. The code to generate #RandomizedTally can be unchanged (without the nonclustered index/primary key used above). The code to generate #OrderedTally changes like this:

    SELECT TOP (1000000)

    N = ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT 1)),0) --ISNULL used to make a NOT NULL column

    INTO #OrderedTally

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    OPTION (MAXDOP 1);

    SELECT TOP(100) * FROM #OrderedTally; -- "Ordering" not guaranteed, but worked on my system

    SELECT TOP(100) * FROM #RandomizedTally;

    (The MAXDOP hint is because on my system, paralellism messes up the order. The SELECT is to check that even though the table is a heap, the physical order happens to be ordered. The code I used does not guarantee that, but it did work for me).

    With this change, the two SELECT statements have the exact same performance stats.

    Hi Hugo,

    Good to "see" you again, ol' friend.

    First, let's revisit the original claim that I take exception to…

    See the problem? They imply that a tally table has to be intrinsically ordered -- this is simply untrue. A tally table simply has to be orderable.

    There is no disagreement on my part that both an "unordered" and an "ordered" Tally Table will work. My claim is that, for performance reasons, a Tally Table does, in fact, have to be "intrinsically" ordered. Using your suggestion of not having indexes on either table so that any perceived overhead of one type of index over another is totally eliminated is a great idea and the code below follows that suggestion.

    Further, all I have to do is prove that there is at least one instance where such "ordering" matters to render my claim as "true" and the quoted claim as "false".

    Let us indeed build both tables without indexes and, for a minute, go with the idea that a heap can be implicitly ordered, as you did with your code, although I didn't trust the implicit ordering of ROW_NUMBER() and added an ORDER BY to the code that builds the "sorted" table much like there's an ORDER BY that builds the randomized table.

    Let's also take the display (the "Great Run-Time Equalizer") out of the picture by dumping the output of the queries to a variable.

    Let's also demonstrate that the physical order of the data in the table actually does matter even without any indexing by adding an ORDER BY to both of your queries and expanding the query to include enough rows to be able to actually measure the differences. This also plays into the quoted claim that "A tally table simply has to be orderable ."

    With all of that in mind, here's the test data I used. As you suggested and with the ORDER BY added to the "sorted" table to make the two setups more equivalent, neither has an index. Both tables are truly heaps.

    --===== Create a "Tally" table as a randomized heap

    WITH

    cteGenerateSequence AS

    (

    SELECT TOP (1000000)

    N = ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    )

    SELECT N = ISNULL(N,0) --ISNULL used to make a NOT NULL column

    INTO #RandomizedTally

    FROM cteGenerateSequence

    ORDER BY NEWID()

    ;

    --===== Create a "Tally" table as "ordered" Clustered Table

    SELECT TOP (1000000)

    N = ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT 1)),0) --ISNULL used to make a NOT NULL column

    INTO #OrderedTally

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    ORDER BY N --<-----<<<< Added this

    ;

    Now, let's use measured code similar to yours but with the ORDER BY's (again, playing into the quoted claim) that I added and the expanded TOP range, and let's see what happens.

    PRINT '===== Using the Randomized Table ====='

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    GO

    DECLARE @Bitbucket INT;

    SET STATISTICS TIME ON

    SELECT TOP(1000000) @Bitbucket = N FROM #RandomizedTally ORDER BY N;

    SET STATISTICS TIME OFF

    GO

    PRINT '===== Using the Sorted Table ====='

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    GO

    DECLARE @Bitbucket INT;

    SET STATISTICS TIME ON

    SELECT TOP(1000000) @Bitbucket = N FROM #OrderedTally ORDER BY N;

    SET STATISTICS TIME OFF

    GO

    The results of that code are as follows (from my older/slower single CPU machine to emphasize the differences)...

    ===== Using the Randomized Table =====

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

    CPU time = 4579 ms, elapsed time = 7059 ms.

    ===== Using the Sorted Table =====

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

    CPU time = 2718 ms, elapsed time = 4375 ms.

    It also doesn't matter if we reverse the order of the code…

    PRINT '===== Using the Sorted Table ====='

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    GO

    DECLARE @Bitbucket INT;

    SET STATISTICS TIME ON;

    SELECT TOP(1000000) @Bitbucket = N FROM #OrderedTally ORDER BY N;

    SET STATISTICS TIME OFF;

    GO

    PRINT '===== Using the Randomized Table ====='

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    GO

    DECLARE @Bitbucket INT;

    SET STATISTICS TIME ON;

    SELECT TOP(1000000) @Bitbucket = N FROM #RandomizedTally ORDER BY N;

    SET STATISTICS TIME OFF;

    GO

    Results:

    ===== Using the Sorted Table =====

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

    CPU time = 2859 ms, elapsed time = 4424 ms.

    ===== Using the Randomized Table =====

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

    CPU time = 4859 ms, elapsed time = 7196 ms.

    … the "ordered" table still wins by a large margin (69% faster) proving that the quoted claim of "intrinsic" ordering not mattering is absolutely false in at least one case.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 48 total)

You must be logged in to reply to this topic. Login to reply