(This is going to sound like a bit of a rant... and it probably is. Sorry... )
Although Grant mentioned it in the end, there's a bit of irony in the article in that I wouldn't hire anyone who wrote such code as that shown in the article (simple as it was).
Folks, getting the correct answer isn't enough... you have to do it with performance and scalability in mind and you must keep both in mind all the time. Add code readability to that, as well.
Let's bump up the stakes a bit... lets use 1,000,000 rows for the same code...
First, Grant's code doesn't work, as is... it dies at 1,000 so it's NOT SCALABLE. He wouldn't make it past the first round because he didn't write the code to be scalable despite what the requirements said (100 rows). Here's the error his code produces...
Server: Msg 245, Level 16, State 1, Line 7
Syntax error converting the varchar value '*' to a column of data type int.
Det's first attempt has a second fault that Grant missed altogether... it has a GO in it that just isn't required and causes the code to fail if you declare any timing variables before it. He also should have done a SET NOCOUNT ON to keep from printing a million (1 row(s) affected) messages which really slowed his code down.
Det's second attempt isn't even worth mentioning because it would produce a million single row result sets, as Grant stated.
Let's look at a couple of other things... Scott made the mistake of formatting the output. For the most part, formatting the output in SQL is a Bozo-No-No. That type of formatting should be done in the app, if there is one.
Except for Det's first attempt, everyone took it at face value that the output should be printed instead of properly returned as a single usable result set. The requirements never said Print anywhere... it said Count. Ya gotta think about being able to use the output from any query as a single usable result set all the time.
John Chapman has the right idea but it'll take a bit more code to reach a million rows. At least he tried a setbased solution! (Michael Valentine Jones has a dandy function that operates similar to John's performance enabled code but I can't find the URL for it).
Serqiy certainly has the right idea, but again, no scalability for the future.
I can't test the CTE solution because I don't have 2k5, but I'll just bet it's fast.
And... everyone failed the "test" because no one included any documentation!
After repairing Grant's code, here's the results from the million row tests (not truly scalable but certainly more than 100 rows )...
Grant's code took 203.626 seconds and pegged my single CPU the whole time.
Chris' code took 193.640 seconds and pegged my single CPU the whole time.
Det's first snippet took 286.156 seconds and pegged my single CPU the whole time.
Scott's code took 285.296 seconds and pegged my single CPU the whole time.
Those are some pretty hefty times for only a million rows. So... how would I do this? Like this, interview or not... would be even faster with MVJ's function...
Test code requirements:
1. Count from 1 to 100. (modified to use 1 million rows)
2. For each number evenly divisible by 3, substitute 'Bizz'.
3. For each number evenly divisible by 5, substitute 'Buzz'.
4. For each number divisible by both substitute 'BizzBuzz'
Rev 00 - 05/28/2007 - Jeff Moden - Initial creation/unit test.
- Deviation: Added duration counter to
measure run duration.
Rev 01 - 05/28/2007 - Jeff Moden - Modified to use 1 million
rows for performance/scalability test.
--===== Declare and start the duration timer
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
--===== Declare and set a variable to control the number of
-- rows created.
DECLARE @DesiredRowCount INT
SET @DesiredRowCount = 1000000
--===== Disable the auto-display of rowcounts for appearance
-- and speed
SET NOCOUNT ON
--===== If the scratchpad table already exists, drop it
-- Note this is just so you can rerun the test.
IF OBJECT_ID('TempDB..#Nums','U') IS NOT NULL
DROP TABLE #Nums
--===== Limit the number of rows to be built
SET ROWCOUNT @DesiredRowCount
--===== Create and populate the table on the fly
SELECT IDENTITY(INT,1,1) AS RowNum,
CAST(NULL AS VARCHAR(10)) AS DesiredResult
FROM Master.dbo.SysColumns sc1 WITH (NOLOCK),
Master.dbo.SysColumns sc2 WITH (NOLOCK)
--===== Restore the number of rows to return to normal
-- Note: Can replace with TOP @variable in SQL 2k5
SET ROWCOUNT 0
--===== Produce the desired results according to the
-- requirements in the code header. Avoids concatenation which
-- would cause the code to run a second slower...
WHEN RowNum % 15 = 0 THEN 'BizzBuzz' --Divisible by 15
WHEN RowNum % 3 = 0 THEN 'Bizz' --Divisible by 3
WHEN RowNum % 5 = 0 THEN 'Buzz' --Divisible by 5
ELSE CAST(RowNum AS VARCHAR(10))
END AS DesiredResult
ORDER BY RowNum
--===== Display the run duration (not part of result set)
... the total run time is 8.610 seconds (NOT a type-o... less than 9 seconds and a good 20 times (2000%) faster than the fastest loop code) and, yes, it pegged my single CPU... but what would you rather have... a pegged CPU for 3+ minutes or 9 seconds? Speed is important and RBAR (pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row") in the form of single row loops and cursors should be avoided at all cost. There are a small handful of exceptions for loops in functions, but for the most part, they are an evil thing.
is pronounced ree-bar and is a Modenism for R
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair
How to post code problemsHow to post performance problemsForum FAQs