I really must thank Steve for his editorial on FizzBuzz. It seemed like a really good topic to do some testing and comparison. Then the comments started rolling in on the topic. This provided more information and ideas to use for this article. As trivial as it may seem, the topic brings up some very pertinent information for us in the Data Bizz. The primary objective of this article is to compare and contrast methods and hopefully provide evidence to use one method over another. The underlying objective being that of performance tuning your code.
A FizzBuzz is a really trivial topic to dispute. The objectives being to weed out those that can from those that can’t at the most basic level. However, with the FizzBuzz, the interviewer has the opportunity to get more insight into the prospective employee. Before delving into the technical aspects, ponder the non-technical aspects first. This sort of test will help to determine if the candidate can quickly assess requirements, is willing to gather more information, organization skills (though very minimally), mettle under pressure, and (at a very high level) character. Keep in mind that all of these things are only a piece of the puzzle. There still remains the technical skill, and “team fit”.
This FizzBuzz requires that one write some code to be generate a list of numbers. The list will have multiples of 3 replaced by Fizz, multiples of 5 replaced by Buzz, and multiples of both 3 and 5 to be replaced by FizzBuzz. Those are the base requirements. The unstated requirements are the requirements that top tier candidates will accommodate instinctively. These requirements are performance, scalability, maintainability, and in a set-based fashion (since this is TSQL and SQL is optimized for Set-Based coding). Sometimes this test is implemented where the requirements state to print out the results. For my purposes, I will take that to simply mean display the results. The methods would be acutely different.
I will explore a few different methods that may be used in achieving the results. Some answers are better than others. The first is a nice long example by Gus (posted in the comments for the SSC editorial).
SET NOCOUNT ON; DECLARE @Table1 TABLE ( ID INT IDENTITY PRIMARY KEY, Number INT); DECLARE @Number INT; SELECT @Number = 1; WHILE @Number <= 100 BEGIN INSERT INTO @Table1 (Number) SELECT @Number WHERE @Number NOT IN (SELECT Number FROM @Table1); SELECT @Number = @Number + 1; END; DECLARE curNumbers CURSOR GLOBAL DYNAMIC FOR SELECT DISTINCT Number FROM @Table1 ORDER BY Number; DECLARE @Number2 INT, @Number2String VARCHAR(MAX), @Msg VARCHAR(MAX); OPEN curNumbers; FETCH FIRST FROM curNumbers INTO @Number2; WHILE @@FETCH_STATUS = 0 BEGIN SELECT @Number2String = CAST(@Number2 AS DECIMAL(6,3))/CAST(3 AS DECIMAL(6,3)); WHILE @Number2String LIKE '%.%' BEGIN SELECT @Number2String = RIGHT(@Number2String, LEN(@Number2String)-1) END; IF CAST(@Number2String AS BIGINT) > 0 BEGIN SELECT @Number2String = CAST(@Number2 AS DECIMAL(6,3))/CAST(5 AS DECIMAL(6,3)); WHILE @Number2String LIKE '%.%' BEGIN SELECT @Number2String = RIGHT(@Number2String, LEN(@Number2String)-1) END; IF CAST(@Number2String AS BIGINT) > 0 BEGIN SELECT @Msg = CAST(@Number2 AS VARCHAR(MAX)); END; ELSE BEGIN SELECT @Msg = 'Buzz' END; END; ELSE BEGIN SELECT @Number2String = CAST(@Number2 AS DECIMAL(6,3))/CAST(5 AS DECIMAL(6,3)); WHILE @Number2String LIKE '%.%' BEGIN SELECT @Number2String = RIGHT(@Number2String, LEN(@Number2String)-1) END; IF CAST(@Number2String AS BIGINT) > 0 BEGIN SELECT @Msg = 'Fizz'; END; ELSE BEGIN SELECT @Msg = 'FizzBuzz'; END; END; PRINT @Msg; FETCH NEXT FROM curNumbers INTO @Number2; END; CLOSE curNumbers; DEALLOCATE curNumbers;
This was created in jest. However, there are some important things to note in the code. First, the use of a table variable is not necessary. With that table variable there is an additional attribute that is unnecessary, even if one decided to use a temp table or table variable. That may be a minor thing, but can be considered to be sloppy coding, inattention to detail, and poor use of resources. Second, the use of a while loop to populate that table. Third, is the use of a cursor to loop through the numbers in the table to do the comparison for the FizzBuzz test. This script was also hard-coded with the record amounts to use. To make it scalable and maintainable, one should parameterize those values. These criticisms are nothing new to Gus. It is important to note that he coded this query in this fashion intentionally, and he pointed out some of the criticisms himself.
Will this code work? Yes it will. It produces the following IO stats and a looping execution plan (1 plan for each iteration).
For 100 records, this takes roughly five seconds to run on my server. 100 records is a very small test and thus larger tests are needed to see about the performance and scalability factor. Scaling up to one million records, this query was still running after six hours.
Next up is the following query:
DECLARE @counter INT DECLARE @OUTPUT VARCHAR(8) SET @counter = 1 WHILE @counter < 101 BEGIN SET @OUTPUT = '' IF @counter % 3 = 0 SET @OUTPUT = 'Fizz' IF @counter % 5 = 0 SET @OUTPUT = @OUTPUT + 'Buzz' IF @OUTPUT = '' SET @OUTPUT = @counter PRINT @OUTPUT SET @counter = @counter + 1 END
This version loops through the numbers and assigns the value to to be printed and then prints it. See what happens when the counter hits 15? We end up assigning a value and then overwriting that value. This version also just prints the result rather than displaying the query results. For 100 records the execution is not bad. There are no IO stats and no Execution plan. However, to run this for the one million records, takes forty-five seconds. If I needed a larger result set, I would also need to be careful of the Int variable. This one would also need to have a change in order to use a variable for the number of records to build.
Next is a pretty nice looking CTE version. There are many renditions of this one.
WITH Numbers(n) AS ( SELECT 1 UNION ALL SELECT 1 + n FROM Numbers WHERE n < 100) SELECT CASE WHEN n%5=0 AND n%3=0 THEN 'FizzBuzz' WHEN n%3 = 0 THEN 'Fizz' WHEN n%5 = 0 THEN 'Buzz' ELSE CAST(n AS VARCHAR(8)) END FROM Numbers OPTION (MAXRECURSION 100);
This query is far more optimal than the previous queries. Though we use a CTE in this query, we are still using Procedural programming. This is a recursive CTE. We are also lacking in the maintainability and scalability of this code. Of a lesser magnitude is the use of the option (maxrecursion 100). Just a little trick that should be noted. The recursive definition has a limiting where clause thus making the maxrecursion statement unnecessary for so few records. Since this performs so well with 100 records, let’s proceed to the one million test. Here is where we run into another trick that must be used. MaxRecursion needs to be specified with a 0. Doing this will permit the query to attain one million. The query now takes 15 seconds to complete. Here are the IO Stats and Execution plan.
Now that I have shared some procedural based methods to accomplish this “simple” task, let’s explore some of the potential set-based methods. The first uses a windowing function to create our number set.
SELECT TOP 100 CASE WHEN ROW_NUMBER() OVER (ORDER BY sc1.name)%15 = 0 THEN 'Dr. Pepper' --Multiples of 3 AND 5 WHEN ROW_NUMBER() OVER (ORDER BY sc1.name)%3 = 0 THEN 'Pepsi' WHEN ROW_NUMBER() OVER (ORDER BY sc1.name)%5 = 0 THEN 'Coke' ELSE CAST(ROW_NUMBER() OVER (ORDER BY sc1.name) AS VARCHAR(8)) END FROM sys.columns sc1, sys.columns sc2
First thing to note is the reduction in code here. The solution is very simple and runs very rapidly. By using a cross join against sys.columns were able to create a result set. This cross join has limitations. Not all sys.columns tables are created equally and thus we would have to add another cross join in order to test one million records. Also, note that the modulo was changed in the FizzBuzz test to use a modulo 15. That was also documented in the short note at the end of the line. Thus the alterations for the query are as follows (to test one million records).
SELECT TOP 1000000 CASE WHEN ROW_NUMBER() OVER (ORDER BY sc1.name)%15 = 0 THEN 'Dr. Pepper' --Multiples of 3 AND 5 WHEN ROW_NUMBER() OVER (ORDER BY sc1.name)%3 = 0 THEN 'Pepsi' WHEN ROW_NUMBER() OVER (ORDER BY sc1.name)%5 = 0 THEN 'Coke' ELSE CAST(ROW_NUMBER() OVER (ORDER BY sc1.name) AS VARCHAR(8)) END FROM sys.columns sc1, sys.columns sc2, sys.columns sc3
This query will return in ~14 seconds. It is a set-based solution. It is not quite yet to that upper echelon though. Some modifications need to be made to make it scalable and maintainable. Also note that I am still employing the use of sys.columns. There are better methods for that. One such is to use master.sys.all_columns. Another is to use a Numbers table. Before going into the use of the numbers table, here are the IO stats and exec plan.
Now for the numbers table. I will jump straight to the one million records testing. The query is not substantially different than the previous query.
SELECT TOP 1000000 CASE WHEN ROW_NUMBER() OVER (ORDER BY sc1.someid)%15 = 0 THEN 'FizzBuzz' --Multiples of 3 AND 5 WHEN ROW_NUMBER() OVER (ORDER BY sc1.someid)%3 = 0 THEN 'Pepsi' WHEN ROW_NUMBER() OVER (ORDER BY sc1.someid)%5 = 0 THEN 'Coke' ELSE CAST(ROW_NUMBER() OVER (ORDER BY sc1.someid) AS VARCHAR(8)) END FROM ADMIN.dbo.Numbers sc1
The runtime for this query is about the same as the previous query. IO Stats and Exec Plan show a slightly different story. Still, note that this query is not quite the scalable or maintainable query I would like to see.
As you can see, Logical reads increased, and the execution plan is slightly better. The cost of this new query is lower and makes it a little more desirable – despite the higher logical reads.
We are now at a point where we have progressed to where we can start playing a bit. The idea now is to fine tune and tweak what I have to see if it can be made better. The remainder of the queries and testing will be in the next article. There is still plenty to cover.