The Numbers Table

  • jeffery.baynard (11/24/2008)


    Itzik uses RECURSION, even faster.

    With nums as

    (

    Select 1 as n

    Union all

    Select n+1 from nums where n < 10

    )

    Select n

    From nums

    As some have pointed out, Itzek doesn't use that method... well, except to show how slow it is. On page 254 of his book "Inside Microsoft SQL Server 2005 - T-SQL Querying", he certainly has such a similar example. But, you have to read the accompanying paragraph... and I quote...

    "I'll start with a naive solution that is fairly slow (about 20 seconds, with results discarded). The following solution uses a simple recursive CTE, where the anchor member generates a row with n = 1, and the recursive member adds a rows in each interation with n = prev n +1:"

    I'll add that recursion isn't much better than a While loop and is sometimes worse. Despite the fact that no explicit While loop has been coded, it is, in fact, a hard coded loop... and I'm not talking about a behind the scenes "psuedo cursor", either.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CheeseMan316 (11/24/2008)


    Thanks for all the responses and thanks yuvalpe for identifying the original source of that query.

    Actually, you need to look at the post just above yuvalpe's... 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'll add that recursion isn't much better than a While loop and is sometimes worse.

    I beg to differ. Recursion (in T-SQL) is always worse than a while loop! It's disgusting how poorly it tends to perform even in simple scenarios. Avoid!

    --
    Adam Machanic
    whoisactive

  • jacroberts (11/24/2008)


    timothyawiseman (11/24/2008)


    jacroberts (11/24/2008)


    Performance isn't always an issue as in my previous example.

    I agree with Jeff that performance should always be a concern. After all, you may eventually need that report urgently updated for the CEO from more recent data, or the server may simply find itself with less and less "maintenance time" without heavy user concerns.

    With that said, there are times when the amount of developer time it would take to improve performance is more important than the performance is. For instance, it does not make sense to spend hours optimizing an ad hoc report that will likely never be run again (though it does make sense to save it in case never actually arrives.)

    I disagree, performance isn't always an issue though it sometimes is. Anyway, the method I listed did show reasonable performance, in fact the performance for small number ranges is much better than the other examples listed, as there is no table/index scanning to do you just request the number range you require and only that number range is generated. For example, if you are reporting on a month's data and you need a number for each day you would only request a table with the range 1 to 31. This would return a table with just 31 rows and would perform faster than a CTE that generates 1 million rows by default or a permanent table with 1 million rows in. Performance is often an issue and badly written and when used code can grind a system to a near halt but as in my first example performance isn't always an issue and anyway this method gives better performance for small number ranges.

    I've worked on and improved the performance of a lot of different SQL systems and I know when performance is important and when it is not. Generally when you have some SQL that is called a lot of times or some SQL that takes a long time to execute it is important to consider performance. If you have a daily report that is run in batch, with no user waiting at the end of a button for the result, it doesn't matter if the SQL takes 3.2 seconds instead of 3.1 seconds to run. You just need common sense to work that out. Highly optimised code can be difficult to understand and so be difficult to maintain which can cause longer development times, maintainability issues and cause bugs to be introduced onto a system. Also, a method that is optimised for one type of query will not necessarily be optimised for a different type of query. For example, if you have a numbers table with 10 million rows in which is really efficient for some large query you might find if you use the same method on a query that only requires 30 rows it might be really inefficient.

    So you can think what you like but performance isn't always a concern. My one example where it isn't a concern has proved your 'always' comment is wrong.

    But how well does it scale? Maybe the report (process) only uses 4000 rows today but what happens, due to growth, the the report starts using 40,000 rows or 400,000 rows? Will it still perform well, or will it be impacted by the growth of data?

    Performance should be a concern, always. If you decide today that something is good enough, and later it isn't and you don't have the time to go back and rework it due to other concerns/issues/problems. If you don't have time to right the first time, how will you find time to fix it later?

  • Adam Machanic (11/24/2008)


    Ed W. (11/24/2008)


    Ok, so you populated a table with numbers. Everyone can argue about how to do that, but the real question is what do you DO with the table of numbers? Why isn't this just a longer post?

    Fantastic question! Here are a few ideas (did I mention, I'm obsessed with numbers tables?):

    http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/splitting-a-string-of-unlimited-length.aspx

    http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/counting-occurrences-of-a-substring-within-a-string.aspx

    http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/dealing-with-very-large-bitmasks.aspx

    http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/pattern-based-split-string.aspx

    http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/reflect-a-tsql-routine.aspx

    http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/tokenize-udf.aspx

    ... and that's mainly just string manipulation. You can also use them to solve gaps and islands problems, build calendar tables, generate sequenced test data... The list goes on and on.

    Well, if you're gonna spoil the surprise like that... 😀

    http://www.sqlservercentral.com/articles/TSQL/62867/

    http://www.sqlservercentral.com/articles/T-SQL/63003/

    And, very happy that Adam has joined in on the conversation... he's the one that first turned me on to "numbers" tables many years ago. Thanks, Adam.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Adam Machanic (11/24/2008)


    I'll add that recursion isn't much better than a While loop and is sometimes worse.

    I beg to differ. Recursion (in T-SQL) is always worse than a while loop! It's disgusting how poorly it tends to perform even in simple scenarios. Avoid!

    Actually... I whole heartedly agree... just didn't want to have to prove it with code (again).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And, very happy that Adam has joined in on the conversation... he's the one that first turned me on to "numbers" tables many years ago. Thanks, Adam.

    Glad to be of assistance. And I'll pass on my own thanks to the great Erland Sommarskog for putting me into the numbers state of mind, with his SQL Server 2000 version of the following article:

    http://sommarskog.se/arrays-in-sql-2005.html

    I hope everyone reading this thread has or will read everything that Erland has written on his site. It's a truly fantastic collection of very interesting material.

    --
    Adam Machanic
    whoisactive

  • thisisfutile (11/24/2008)


    Jeff Moden (11/24/2008)


    jacroberts (11/24/2008)


    ... queried the table with 4000 rows in in less than 1 second.

    That's pretty slow and performance should be always an issue...

    With all due respect Jeff, it really does matter what work environment you're in. I have to agree with jacroberts in that 'query' performance indeed is not always an issue. In many cases, worker performance is more important than query performance. If my boss new I was trying to make a query run less than "less than a second", I'm guessing he'd be disappointed in my use of time.

    Yep... those are usually the environments that call "tuning experts" in after they actually get some data into their database to figure out what's wrong and how to speed things up. Either that or they buy a pot wad of new hardware, spend a fair amount on the migration, etc, etc, only to find out that the code is the real problem and the hardware didn't help as much as expected.

    Coding for performance isn't difficult and it certainly shouldn't take someone hours to do compared to RBAR methods... but it is like practicing piano... if you intentionally hit the wrong notes, you won't get any better at hitting the right ones.

    The worse part about writing code just good enough to get the job done is the future... either the data they said wasn't going to get substantially bigger does, or some poor developer with a manager's "schedule gun" up against his head finds the code that was just good enough, and uses it in a place where it ought not be used for performance and scalability reasons and BOOM!

    The really bad part is that most managers have absolutely no clue about future performance and scalability because they're not held personally responsible for such failures. In fact, most shops have no "SLA" for performance... they just push code. If it looks good and it works for the demo, good enough, push it. Then, it takes 8 times longer than it took to write the original code to fix it plus, you have to retest the code to make sure you didn't break anything else.

    If you don't think saving a second is important, imagine a low usage 4 cpu server that only takes a paltry 4 hits per second... each one using some function or proc that someone thought a "second" was good enough. That's 4 CPU seconds for the 4 hits per second... that's 4 CPU's slammed into the wall because someone thought time to market and schedule was more important than performance or scalability...

    ... I won't embarrass my boss (or myslef, for that matter) that way even if (s)he insists because they just don't know. Tell them... educate them... they can't eat ya and if they're any good at all, they'll get it. If the don't and aren't, then maybe it's time to start looking around for a better boss.

    Anyway, I can understand that a lot of folks feel the same way... that the urgency for performance should be tempered by the urgency for delivery. The real key is that folks writing code should already know how to write high performance code instead of reverting to RBAR methods like While loops. If they don't, they should practice more. If they don't want to practice their trade more, well that's a whole different problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jacroberts (11/24/2008)


    timothyawiseman (11/24/2008)


    jacroberts (11/24/2008)


    Performance isn't always an issue as in my previous example.

    I agree with Jeff that performance should always be a concern. After all, you may eventually need that report urgently updated for the CEO from more recent data, or the server may simply find itself with less and less "maintenance time" without heavy user concerns.

    With that said, there are times when the amount of developer time it would take to improve performance is more important than the performance is. For instance, it does not make sense to spend hours optimizing an ad hoc report that will likely never be run again (though it does make sense to save it in case never actually arrives.)

    I disagree, performance isn't always an issue though it sometimes is. Anyway, the method I listed did show reasonable performance, in fact the performance for small number ranges is much better than the other examples listed, as there is no table/index scanning to do you just request the number range you require and only that number range is generated. For example, if you are reporting on a month's data and you need a number for each day you would only request a table with the range 1 to 31. This would return a table with just 31 rows and would perform faster than a CTE that generates 1 million rows by default or a permanent table with 1 million rows in. Performance is often an issue and badly written and when used code can grind a system to a near halt but as in my first example performance isn't always an issue and anyway this method gives better performance for small number ranges.

    I've worked on and improved the performance of a lot of different SQL systems and I know when performance is important and when it is not. Generally when you have some SQL that is called a lot of times or some SQL that takes a long time to execute it is important to consider performance. If you have a daily report that is run in batch, with no user waiting at the end of a button for the result, it doesn't matter if the SQL takes 3.2 seconds instead of 3.1 seconds to run. You just need common sense to work that out. Highly optimised code can be difficult to understand and so be difficult to maintain which can cause longer development times, maintainability issues and cause bugs to be introduced onto a system. Also, a method that is optimised for one type of query will not necessarily be optimised for a different type of query. For example, if you have a numbers table with 10 million rows in which is really efficient for some large query you might find if you use the same method on a query that only requires 30 rows it might be really inefficient.

    So you can think what you like but performance isn't always a concern. My one example where it isn't a concern has proved your 'always' comment is wrong.

    Lordy, Jac... you wouldn't generate a million rows in a CTE just to get 31... the WHERE clause in the outer Select would limit that. And, there's nothing difficult to understand about highly optimized code, which is actually and usually easier to understand than some loop.

    And, yeah, I agree.. the difference between 3.2 and 3.1 seconds or, in the case of 31 days, the difference between 1 millisecond and 0 milliseconds is absolutely trivial. Until some developer finds your code and runs it in a much larger environment a lot more times than you ever expected. For example, some developer is told to write a a split routine for a million row table with up to 10,000 characters in the split column... and finds your code... if (s)he's lucky, they know they only have to call your function once to split the whole table... if not and (s)he makes a function that calls your function, which takes a half second to gen 10k numbers for both CPU and duration, turns the million row split into a half million second nightmare. At 10k numbers, the Tally table still takes 0 CPU seconds and only 46 milliseconds duration... at least (s)he has a chance with the Tally (Numbers) table.

    The other thing is that if you don't practice writing set based code for the small things, you'll never get good at doing it on the large things. It's like intentionally hitting the wrong notes while practicing the piano... you're building up the wrong muscle memory.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GOC (11/24/2008)


    I must be missing something here. The thing I don't get about the whole tally of numbers discussion is why so much time and effort seems to always be devoted to demonstrating fancy ways to create a hellishly simple table.

    I guess it's the same reason why people climb mountains... whatayagonna do when you get there? Climb down? :hehe:

    Given that these things have a million uses, why not just create one as a matter of course when setting up your server? That way it's always there when you need it. Ten seconds vs ten minutes as a one-off exercise over the life of the server is an academic argument.

    I agree... set one up on the server as a matter of fact. Why bust chops about how it's made or how long it takes? Well, other than climbing that bloody mountain that so many are so proud of, it's practice for everyone. I've always thought it very odd that anyone would use the very same tool (While Loop) that they're trying not to use by building the Tally table. So, other than being highly repetitious, I think it shows folks who haven't seen it before, a whole new way to do things. Loopless "pseudo-cursors" that use some form of cross join to replace a While loop is absolutely one of the fundamentals to replacing RBAR with set based code. If nothing else, some folks learn something new and say, "Hmmm.... wonder if that would work on this other problem?" For example... they might use it to generate a million rows of test data to check some of their functions on. They won't use it if it takes 10 or 12 minutes to run everytime they want to use it. And, they can't just leave it because the DBA will come looking for them in a day or two.

    As clever as some of the methods for creating a million rows are, the real value in this discussion will come from showing how to use them to replace RBAR loops.

    Absolutely correct! I'm thinking that if anyone has any special requests, that they should direct them to the author of this fine thread. He may not entertain all requests, but it would probably get his creative juices flowing. Answer is always "no" unless you ask.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jacroberts (11/24/2008)


    timothyawiseman (11/24/2008)


    jacroberts (11/24/2008)


    Performance isn't always an issue as in my previous example.

    I've worked on and improved the performance of a lot of different SQL systems and I know when performance is important and when it is not. Generally when you have some SQL that is called a lot of times or some SQL that takes a long time to execute it is important to consider performance. If you have a daily report that is run in batch, with no user waiting at the end of a button for the result, it doesn't matter if the SQL takes 3.2 seconds instead of 3.1 seconds to run. You just need common sense to work that out. Highly optimised code can be difficult to understand and so be difficult to maintain which can cause longer development times, maintainability issues and cause bugs to be introduced onto a system. Also, a method that is optimised for one type of query will not necessarily be optimised for a different type of query. For example, if you have a numbers table with 10 million rows in which is really efficient for some large query you might find if you use the same method on a query that only requires 30 rows it might be really inefficient.

    So you can think what you like but performance isn't always a concern. My one example where it isn't a concern has proved your 'always' comment is wrong.

    The problem is that your example did not show a case where efficiency is not a concern. Once again, I agree that there are times where the developer time require for an efficiency gain is not worth it, but that does not mean efficiency is not a concern merely that it can sometimes be overridden by higher concerns.

    But, in your example you say there is no need to worry about the efficiency of a query that runs with no user waiting, but I disagree. Presumably, that query is running during a "maintenance window", and maintenance windows have a way of shrinking as time goes on and more and more users with different schedules add on and more and more jobs need to be run during those windows.

    Even if you truly have all the time in the world, it is simply good practice to always write efficient code.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • timothyawiseman (11/24/2008)[hr

    >>The problem is that your example did not show a case where efficiency is not a concern.

    >>But, in your example you say there is no need to worry about the efficiency of a query that runs with no user waiting, but I disagree.

    If you had read to the end of the sentence I wrote you would have seen that efficiency is not a concern in this example. So I repeat:

    If you have a daily report that is run in batch, with no user waiting at the end of a button for the result, it doesn't matter if the SQL takes 3.2 seconds instead of 3.1 seconds to run.

  • Jeff Moden (11/24/2008)


    thisisfutile (11/24/2008)


    Jeff Moden (11/24/2008)


    jacroberts (11/24/2008)


    ... queried the table with 4000 rows in in less than 1 second.

    That's pretty slow and performance should be always an issue...

    With all due respect Jeff, it really does matter what work environment you're in. I have to agree with jacroberts in that 'query' performance indeed is not always an issue. In many cases, worker performance is more important than query performance. If my boss new I was trying to make a query run less than "less than a second", I'm guessing he'd be disappointed in my use of time.

    Yep... those are usually the environments that call "tuning experts" in after they actually get some data into their database to figure out what's wrong and how to speed things up. Either that or they buy a pot wad of new hardware, spend a fair amount on the migration, etc, etc, only to find out that the code is the real problem and the hardware didn't help as much as expected.

    Coding for performance isn't difficult and it certainly shouldn't take someone hours to do compared to RBAR methods... but it is like practicing piano... if you intentionally hit the wrong notes, you won't get any better at hitting the right ones.

    The worse part about writing code just good enough to get the job done is the future... either the data they said wasn't going to get substantially bigger does, or some poor developer with a manager's "schedule gun" up against his head finds the code that was just good enough, and uses it in a place where it ought not be used for performance and scalability reasons and BOOM!

    The really bad part is that most managers have absolutely no clue about future performance and scalability because they're not held personally responsible for such failures. In fact, most shops have no "SLA" for performance... they just push code. If it looks good and it works for the demo, good enough, push it. Then, it takes 8 times longer than it took to write the original code to fix it plus, you have to retest the code to make sure you didn't break anything else.

    If you don't think saving a second is important, imagine a low usage 4 cpu server that only takes a paltry 4 hits per second... each one using some function or proc that someone thought a "second" was good enough. That's 4 CPU seconds for the 4 hits per second... that's 4 CPU's slammed into the wall because someone thought time to market and schedule was more important than performance or scalability...

    ... I won't embarrass my boss (or myslef, for that matter) that way even if (s)he insists because they just don't know. Tell them... educate them... they can't eat ya and if they're any good at all, they'll get it. If the don't and aren't, then maybe it's time to start looking around for a better boss.

    Anyway, I can understand that a lot of folks feel the same way... that the urgency for performance should be tempered by the urgency for delivery. The real key is that folks writing code should already know how to write high performance code instead of reverting to RBAR methods like While loops. If they don't, they should practice more. If they don't want to practice their trade more, well that's a whole different problem.

    Touché, Jeff. Thanks for the professional approach in your response. Too many in forums will reply with "YOU'RE WRONG" and then some expletives which just pushes the opposing view farther away. You've gained my attention for two reasons. Number one, we have a relatively new database (3.5 years and 6GB in size) and number two, we're looking for new hardware (although it's not because of performance) but your response made me humbly consider the stance I've made. The fact that you pointed out several examples for areas of concern demonstrates your conviction so as a "fledgling" DBA, I'll no doubt be more sensitive to these same areas as I look at existing code that I've written or as I write new. Thanks again for the professionalism.

  • If you have a daily report that is run in batch, with no user waiting at the end of a button for the result, it doesn't matter if the SQL takes 3.2 seconds instead of 3.1 seconds to run.

    This is an interesting one to consider. Yes, it's true. If you have no "user" waiting on the other end, it probably doesn't matter if it takes 3.2 instead of 3.1 seconds. Or even 30.2 seconds. The user won't notice. But what if we have not a "user", but lots of "users"? What if we have a batch system that needs to process--and send--a few hundred thousand reports each day? In that case, perhaps it would be nice to tune those reports down by a tenth of a second each. This would reduce overall strain on the server, and get the users at the end of the batch their reports just a bit faster.

    Personally, while I try not to prematurely micro-optimize my code, I also try to design for scale. If you take this approach at worst you'll spend a bit more time thinking about how best to do a given task, and at best you'll save your company hundreds or thousands of hours later when the workload increases. I think it's a win-win.

    --
    Adam Machanic
    whoisactive

  • Adam Machanic (11/25/2008)


    If you have a daily report that is run in batch, with no user waiting at the end of a button for the result, it doesn't matter if the SQL takes 3.2 seconds instead of 3.1 seconds to run.

    >>But what if we have not a "user", but lots of "users"? What if we have a batch system that needs to process--and send--a few hundred thousand reports each day?

    Anyone can take any example and change it so it performance is important.

Viewing 15 posts - 46 through 60 (of 106 total)

You must be logged in to reply to this topic. Login to reply