Why don't database developers like regular expressions?
- Firstly, there are countless flavors of it, from PCRE, Vim, Java to POSIX ERE. Microsoft still have three distinct, basic flavors, in many varieties. You never can be quite sure how your favorite RegExes will work if you change your application, or work in two different versions of one application (as with SSMS).
- Secondly, Regexes aren't intuitive. Even if you understand concepts such as 'greedy' 'possessive' and 'lazy' and you think you've wrestled with the complexities of matching line ends, you are confronted with a metacharacter syntax that may have made sense to the legendary mathematical geeks who devised it, such as Stephen Cole and Ken Thompson, but is hardly destined to appeal to the legendary, ordinary application developer on the legendary Clapham Omnibus.
- Thirdly, when they are used as predicates in filters, such as
WHEREclauses, it is very difficult to reduce the number of searches to the possible candidates. Unless the Regex makes a certainty of the start of the string, it is hopeless.
Instead, in SQL Server, we cling to the
PATINDEX wildcard convention. It is a Sybase invention, a sort of poor-man's RegEx, so Microsoft have never bothered to enhance the syntax. In a sense this is a joy. The
LIKE wildcards that you write now will work in any version of SQL Server. However, they are so deficient in the basic anchors, quantifiers, character classes and alternators that any useful wildcards become very complicated even to look at, let alone understand. Frustratingly, Regexes are there in SQL Server, but out of reach, only for XQuery/XPath expressions.
With MySQL, MongoDB or PostgreSQL, Regular expressions are at the heart of searches, and they aren't causing the sort of performance problems that Microsoft engineers seem to conjure up in their fevered imagination. Microsoft can't really afford to dither any longer with facilitating better string pattern search. As I see it there are two options: either enhance the
PATINDEX wildcard conventions or provide a good Regex search that is compatible with .NET Core. I'd be happy with the former, just to allow easier constraints for complicated string-based datatypes, such as sort codes or postal codes. We must retain backward compatibility anyway, and it would be great to have something that prevents us having to learn two different conventions for pattern search.
Curiously, even PowerShell has a '
-like' comparison operator. It is even more of a poor, shriveled thing than the SQL Server one. The documentation for it consists mostly of stern instructions to use the
-match Regex instead. This seems a bit rich, coming from the company that popularized the wildcard convention for searching the filesystem. Come on, SQL Server team. Wildcards need to be improved!