• Great article Stefan. A few months ago I was given a project to find all the symbols in our entire system's data. Specifically to find SQL symbols (%,@ etc). We have around 40 Gb of data, and the majority of it is character strings. I used this same approach, and it suprised me at its speed. I was expecting to have to parse through several billion characters. The process ran for around 5 hours, which was much better than I expected.

    This was a one time run, so I did not spend a lot of time optimizing it, but it does make me wonder just how fast this could be tweaked to.