Peter Di (11/3/2012)
Jeff, This article is about calling a stored procedure with multiple parameters not for retrieving data from a column with values separated by comma. Do you really believe that it matters if one call with few comma separated values will take 30 milliseconds or 100% more (60 milliseconds)? I agree that there are ways to optimize the parsing methods, but be real, in a year of use you will save 5 seconds processor time and you will spend 1 hour to develop it.
Very good questions, Peter. Seriously. And, apologies for the long winded answer but good questions deserve thoughtful answers.
First, my objections weren't just about time/duration. There was nothing in the code to identify when one or more elements of the passed parameter weren't in the comparison table. The design was missing the necessary validation and feedback. That, of course, could be easily fixed but that would also eat into the hour you're talking about. If it wasn't found, it could cost much more than time as a missed error depending on what it was actually doing.
Shifting gears back to the focus of the time related questions.
I agree. One call with a few comma separated values against a relatively tiny table that takes 60 or even 600 millseconds isn't going to matter in the grand scheme of an application... until it's no longer one call. If we take the lower number of just 30 milliseconds and put it up against a higher usage application where it might be used a thousand times per second, we're suddenly talking about 30 seconds of CPU time per second. In broader terms, that will take 30 minutes of CPU time for every hour for this simple task when it should take nearly zero minutes per hour. If the method is used in many places in the app for each call, you're suddenly talking about a whole lot of CPU time being used for something so simple. Yeah, I know. "Hardware is cheap". We'll talk about that at the end of this.
Ok... agreed. Not everyone is going to have that frequency (1.000 times per second) of usage for an app and it might, in fact, only be used once in the app. As you correctly point out (I'm right there with you on this), is it really worth spending an hour on to make it take only a couple of milliseconds (hopefully, less) instead of 30? The answer is patently "No" but not quite the way you might be thinking. Developers shouldn't have to spend but a minute or two on something like this because they should already know that nearly every app is going to need to handle such things and, unless they're a rank newbie, they should have already studied what the best methods are (combination of 100% accuracy and excellent performance) and have it ready to copy and paste from their library. I guess that's a personal gripe of mine. A lot of people don't study for the job they're supposed to be doing.
Now, let's get to the more insidious side of this. Low usage, low row counts, and supposedly low frequency should never be used as an excuse for "good enough" code for several reasons. First and like I pointed out, a Developer should already know what the fast stuff really is and have it at the ready for CPR (Copy, Paste, Replace). Second, there are those developers who haven't taken quite that interest in their job so when they're hit with a tight schedule, they'll use anything they can find so long as it works on one to ten rows (or calls) never giving consideration to the fact that their work will have to scale.
I just went through this at work, again, with code very similar to that in the article but backwards in flow of data. Someone wrote some code against a table that everyone "knew" wasn't going to grow by much. The table was supposed to have 28 rows in it and it wasn't likely that it would ever even double. They were right. After 2 years, it should have only 51 rows in it. The trouble is that they didn't want to maintain the table manually so as the number of possible selections started to grow, they used the very same method they previously used (to save development time)but against a data table. When I discovered the problem, the data table had grown to 4.7 million rows and they had "thoughtfully" added a DISTINCT to create the lookup list with a UNION against the original table. The code was only used 1,000 times per hour (low frequency, right?) but it had grown (to >2 seconds per call) to using a total of 40 CPU minutes and more than an eighth of a Terabyte in reads per hour... to return just 51 rows for a drill down menu that was only used 1,000 times per hour.
Like I said, it had grown to taking slightly more than 2 seconds to return. Let's see… it gets used by our internal people 1,000 times per hour times 12 hours per day (extended coverage phone center), every 21.7 working days of the month. That's over 144 FTE hours per month that we're paying people to wait for a pulldown menu to load. Considering nearly linear growth of the data table over the two years the code had existed, that's 1,800 hours of wait time we paid people for. Even if a Developer wasn't prepared to handle this eventuality, that's a hell of a trade off compared to the 1 hour of development time you were talking about.
It only took me a half hour to discover this problem. It took me about an hour to write the code to fix the problem. The reason why it took so long is because I didn't want someone to have to fix any front-end code to support the fix so it had to be 100% transparent to the GUI and it had to continue to be self maintaining. Just adding the rows to the original table wasn't going to hack that little requirement. Then, it had to go through QA testing where they had to not only verify that the screen was still working correctly but that the underlying data to build the menu was being interpreted correctly. That took one person an hour because they're not database people. In order for them to test what I had done, I had to spend another half hour writing up what I had done and how it worked. That's another 3 hours spent on a problem that could have been avoided by spending an hour to do it right the first time.
For the record, I got each call down to where it belongs at about 200 microseconds and the total number of reads down to just 16 megabytes per hour. There was no rocket science involved in the fix, either. Any developer with just a couple of years of experience could have pulled the same thing off in about an hour of original development time.
So, with apologies to my good friend, Kevin Boles, and to answer your question as to whether or not I believe people should worry about a small comma delimited list taking "just" 30 milliseconds to be used on a relatively infrequent basis against a relatively small number of rows, my answer is a resounding "YES THEY SHOULD!" even if takes them an hour because you don't know how the method will eventually be used by someone. A whole lot of managers really need to learn this particular lesson when they write a development schedule. Compare the one hour invested to the 1,800 hours of wasted employee time and the several hours coming up with and testing a fix.
Why did I apologize to Kevin for this? Because he's an expert performance tuning consultant that gets paid big bucks to fix things like this. There goes more time and money towards something that could have only taken an hour to do right the first time or even the second time they did it… or even just a couple of minutes by someone who knows to expect and is prepared to handle these types of development requirements as a part of their everyday job.
Will the particular method in the article ever be exposed to so much growth? Maybe not but you just don't know for sure because requirements change. Plan for the worst so that if it does happen, you don't have to make any repairs just because scale increased. Bullet proof code just doesn't take that much longer to write and it's worth every penny down the road.