I found something interesting the other day. I was attempting to optimize a very heavy string parsing routine using T-SQL and was having problems. I don’t normally use T-SQL for such heavy string parsing, but this was a special case of a legacy structure that I had to work with, so I had no choice. Anyway, the said routine was a table function that basically takes a fixed length text file and turns it into a table (please don’t ask why) so you can join it to other tables in a database. The thing that made this interesting is that this fixed length text file had over 4,000 rows, so in order to turn this into a table the function has to slice and dice these 4,000 rows every time it’s executed, which it does by executing a WHILE loop.
When I finished this function, it ran in about 15 seconds, which was way too slow for our needs. Unbeknownst to me, one of my colleagues was attempting the same thing and wrote his code in a slightly different way, but he got it to run in about 1 second. When I analyzed his code, I found it to be virtually identical to mine, with one difference: during every iteration I was using the REPLACE function to ‘wittle’ down the string until none of the string was left, at which point the WHILE loop would exit and the function would finish executing. In contrast, his approach did not have any REPLACE function, he was simply navigating forward in the string until he reached the end.
At this point one of my very special mental “talents” kicked in: when my mind is stuck on a problem and someone ELSE tells me the solution to it my mind immediately says “oh, I knew that!” and proceeds to give me a detailed explanation of the solution and why it works. And leaves you wondering where all that great information was hiding 5 seconds ago when you didn’t have the answer and you really needed it….you have to wonder why such useless “talents” even exist….
So, of course, once I saw this then everything became obvious: my version of the function was doing over 4,000 REPLACE operations and his wasn’t. To add to this, I have to wonder if the string datatypes, such as the VARCHAR datatype that we were both using, are implemented as immutable data types in SQL Server. Does anyone know if this is the case? If it is then it would further explain the difference in speed. Since an immutable data type doesn’t really ‘change’ the instance of the string in memory and just adds a new instance of a string, this means that my function was creating over 4,000 instances in memory of the large string that I was parsing. Talk about a waste!
I mention this because lots of SQL developers may not have much experience in string parsing, since this is normally not done using SQL. This may be old hat to many application programmers, but it’s not something that DBAs run into as frequently, so I’m hoping that others will find it as useful as I did.
So, the moral of the story is: be careful when using REPLACE in heavy string parsing operations. Like many other things, if you need it you should use it, but the trick is in knowing when you REALLY need it. In my case I thought I needed it but in reality there was a much simpler solution available, and it ended up running 15 times faster. It proves the old adage of “Keep It Simple, Stupid”…given 2 equivalent solutions the simpler one is better.
…and that’s my $.02 for today….