﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / SQL Server 2008 / T-SQL (SS2K8)  / Better Way to Perform this Query / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Sun, 19 May 2013 19:06:36 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]wolfkillj (1/15/2013)[/b][hr][quote][b]dwain.c (1/15/2013)[/b][hr][quote][b]wolfkillj (1/15/2013)[/b][hr]I've been working on a set-based T-SQL solution to determine the Damerau-Levenshtein Distance between two strings. (DLD distance is just like LD - the number of changes required to transform one string into the other - but DLD allows transpositions (counting as one change) as well as the additions, deletions, and substitutions allowed by LD). My code is not yet working reliably, but it looks very similar to Chris's. If I ever get it working, I'll post it, but it's been about six weeks since my "real" work has allowed me any time to tinker with it. Jason[/quote]Jason - DLD is actually the approach applied in the article I linked to in my prior post.  Forgot to mention it.  You may want to take a look because the author spent a lot of time trying to optimize his solution and it might save you a bit of effort.Don't mean for it to take away the challenge though. :-D[quote][b]dwain.c (1/14/2013)[/b][hr][quote][b]Jeff Moden (1/14/2013)[/b][hr][quote][b]dwain.c (1/14/2013)[/b][hr]Good grief!  Now I suppose I'm going to need to put up or shut up.[/quote]Heh... glad it's not me this time.  :-P[/quote]Maybe I don't need to because it's already been done.[url]http://www.sqlservercentral.com/articles/Fuzzy+Match/92822/[/url]Seems I'd forgotten about this article, even though I believe I posted something to its discussion thread.[/quote][/quote]Actually, that article was my starting point in challenging myself to come up with a true set-based DLD solution. The code posted there is functional but requires a lot of control-of-flow language. I hope to come up with a solution that can be used as an inline table-valued function. I'd say I've got it about 90% solved, but there are still some inconsistencies to resolve.[/quote]That was my intention too. It's not quite there in terms of the spec, but I think it may be tweakable, and it can easily be converted to an iTVF. Bearing in mind that fuzzy-matching is an inexact science, I spent a few minutes working on a modification - functionally crippled in terms of fitting the full Levenshtein distance model but designed instead for speed (fuzzy matching is SLOW), and this effort cropped up out of the alphabet soup:[code="sql"]DECLARE @Reference VARCHAR(100), @Target VARCHAR(100), @WordLength INTSELECT @Reference = 'Maltesers', @Target = 'Maltster' -- 91.36%SELECT @Reference = 'millipede', @Target = 'millimetre' -- 69.00%SELECT @Reference = 'smith', @Target = 'smythe' -- 66.67%SELECT @Reference = 'Smith', @Target = 'Smith' -- 100.00% SELECT @WordLength = MAX(WordLength) FROM (SELECT WordLength = DATALENGTH(@Reference) UNION ALL SELECT DATALENGTH(@Target)) dSET @Reference = LEFT(@Reference + REPLICATE('_',@WordLength),@WordLength) SET @Target = LEFT(@Target + REPLICATE('_',@WordLength),@WordLength) ;WITH Tally AS ( -- restricted to 20 letters max	SELECT TOP(@WordLength) n 	FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) t2 (n))SELECT 	[Score %] = CAST(SUM(LetterScore)*100.0/(@WordLength*@WordLength) AS NUMERIC(5,2))FROM (	SELECT  		seq = t1.n,		ref.Letter,		LetterScore = @WordLength - ISNULL(MIN(tgt.n),@WordLength)	FROM tally t1 	CROSS APPLY (SELECT Letter = SUBSTRING(@Reference,t1.n,1)) ref	OUTER APPLY (		SELECT n = ABS(t1.n - t2.n)		FROM tally t2 		WHERE SUBSTRING(@Target,t2.n,1) = ref.Letter	) tgt	GROUP BY t1.n, ref.Letter) d[/code]The execution plan and brief testing indicate that it's very quick and again it's easily converted to an iTVF. If I were performing fuzzy-matching now, I'd probably settle for a couple of complimentary methods like this one plus maybe token-matching.</description><pubDate>Wed, 16 Jan 2013 01:02:11 GMT</pubDate><dc:creator>ChrisM@Work</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]wolfkillj (1/15/2013)[/b][hr]Actually, that article was my starting point in challenging myself to come up with a true set-based DLD solution. The code posted there is functional but requires a lot of control-of-flow language. I hope to come up with a solution that can be used as an inline table-valued function. I'd say I've got it about 90% solved, but there are still some inconsistencies to resolve.[/quote]I can certainly appreciate that perspective as my curiosity often pulls me into such things as well.  I looked at the LD algorithm on Wiki and since it was recursive I thought to apply a rCTE to it.  Unfortunately I got bogged down and couldn't get one working in the 30 minutes (over lunch like Chris) that I had to devote to it.Be sure to let us know what you come up with.  There may be an article in it.</description><pubDate>Tue, 15 Jan 2013 18:11:32 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]dwain.c (1/15/2013)[/b][hr][quote][b]wolfkillj (1/15/2013)[/b][hr]I've been working on a set-based T-SQL solution to determine the Damerau-Levenshtein Distance between two strings. (DLD distance is just like LD - the number of changes required to transform one string into the other - but DLD allows transpositions (counting as one change) as well as the additions, deletions, and substitutions allowed by LD). My code is not yet working reliably, but it looks very similar to Chris's. If I ever get it working, I'll post it, but it's been about six weeks since my "real" work has allowed me any time to tinker with it. Jason[/quote]Jason - DLD is actually the approach applied in the article I linked to in my prior post.  Forgot to mention it.  You may want to take a look because the author spent a lot of time trying to optimize his solution and it might save you a bit of effort.Don't mean for it to take away the challenge though. :-D[quote][b]dwain.c (1/14/2013)[/b][hr][quote][b]Jeff Moden (1/14/2013)[/b][hr][quote][b]dwain.c (1/14/2013)[/b][hr]Good grief!  Now I suppose I'm going to need to put up or shut up.[/quote]Heh... glad it's not me this time.  :-P[/quote]Maybe I don't need to because it's already been done.[url]http://www.sqlservercentral.com/articles/Fuzzy+Match/92822/[/url]Seems I'd forgotten about this article, even though I believe I posted something to its discussion thread.[/quote][/quote]Actually, that article was my starting point in challenging myself to come up with a true set-based DLD solution. The code posted there is functional but requires a lot of control-of-flow language. I hope to come up with a solution that can be used as an inline table-valued function. I'd say I've got it about 90% solved, but there are still some inconsistencies to resolve.</description><pubDate>Tue, 15 Jan 2013 17:57:33 GMT</pubDate><dc:creator>wolfkillj</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]wolfkillj (1/15/2013)[/b][hr]I've been working on a set-based T-SQL solution to determine the Damerau-Levenshtein Distance between two strings. (DLD distance is just like LD - the number of changes required to transform one string into the other - but DLD allows transpositions (counting as one change) as well as the additions, deletions, and substitutions allowed by LD). My code is not yet working reliably, but it looks very similar to Chris's. If I ever get it working, I'll post it, but it's been about six weeks since my "real" work has allowed me any time to tinker with it. Jason[/quote]Jason - DLD is actually the approach applied in the article I linked to in my prior post.  Forgot to mention it.  You may want to take a look because the author spent a lot of time trying to optimize his solution and it might save you a bit of effort.Don't mean for it to take away the challenge though. :-D[quote][b]dwain.c (1/14/2013)[/b][hr][quote][b]Jeff Moden (1/14/2013)[/b][hr][quote][b]dwain.c (1/14/2013)[/b][hr]Good grief!  Now I suppose I'm going to need to put up or shut up.[/quote]Heh... glad it's not me this time.  :-P[/quote]Maybe I don't need to because it's already been done.[url]http://www.sqlservercentral.com/articles/Fuzzy+Match/92822/[/url]Seems I'd forgotten about this article, even though I believe I posted something to its discussion thread.[/quote]</description><pubDate>Tue, 15 Jan 2013 17:34:45 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]wolfkillj[/b]it's been about six weeks since my "real" work has allowed me any time to tinker with it.[/quote]Don't you just hate when that happens?  I think I'm going for a brute force, no holds barred approach.</description><pubDate>Tue, 15 Jan 2013 12:02:53 GMT</pubDate><dc:creator>Greg Snidow</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]ChrisM@Work (1/15/2013)[/b][hr][quote][b]Alan.B (1/14/2013)[/b][hr][quote][b]Jeff Moden (1/10/2013)[/b][hr][quote][b]Alan.B (1/10/2013)[/b][hr][quote]This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done). [/quote]Heh... if that's how you warm up for the day, then you've got me beat by a mile.[/quote]Perhaps, if I got it right ;). I did not get it correct but I certainly found a more interesting SQL excercize than Fizzbuzz. I'm going to keep trying to come up with a 100% set-based Levenshtein unless someone beats me to it.[/quote]Not yet - at least, not in half an hour over lunch. However, I don't think this is far off and is easily modified:[code="sql"]DECLARE @Reference VARCHAR(100) = 'maltesers',  	@Target VARCHAR(100) = 'maltster',	@WordLength INT	---------------------------------------------------------------------------- demonstration version - see how it works--------------------------------------------------------------------------SELECT @WordLength = MAX(WordLength) FROM (SELECT WordLength = DATALENGTH(@Reference) UNION ALL SELECT DATALENGTH(@Target)) dSET @Reference = LEFT(@Reference + REPLICATE('_',@WordLength),@WordLength) SET @Target = LEFT(@Target + REPLICATE('_',@WordLength),@WordLength) ;WITH Tally AS (SELECT TOP(@WordLength) n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 		FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a (n) 		CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b (n)),Calculator1 AS (	SELECT 		t.row, tLetter, 		r.col, rLetter, 		x.result, 		best_result = MAX(x.result) OVER(PARTITION BY r.Col)	FROM ( -- matrix columns from reference word		SELECT [col] = n, rLetter = SUBSTRING(@Reference, Tally.n, 1)  		FROM Tally 	) r	CROSS JOIN ( -- matrix rows from target word		SELECT [row] = n, tLetter = SUBSTRING(@Target, Tally.n, 1) 		FROM Tally 	) t	CROSS APPLY (		SELECT result = (CASE 			WHEN r.col = t.[row] AND rLetter = tLetter THEN 1 -- equal			WHEN rLetter = tLetter THEN (LEN(@Reference)-ABS(r.col-t.[row]))*1.00/				(LEN(@Reference+@Target)/2) -- movement			WHEN r.col = t.[row] AND '_' NOT IN (rLetter,tLetter) THEN 0.1 -- substitution			WHEN r.col = t.[row] THEN 0 -- missing (end of word)			ELSE 0 END) 	) x)SELECT * FROM Calculator1 ORDER BY col,row---------------------------------------------------------------- working version--------------------------------------------------------------;WITH Tally AS (SELECT TOP(@WordLength) n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 		FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a (n) 		CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b (n)),Calculator1 AS (	SELECT 		r.col, rLetter, 		best_result = MAX(x.result) OVER(PARTITION BY r.Col)	FROM (SELECT [col] = n, rLetter = SUBSTRING(@Reference, Tally.n, 1) FROM Tally) r	CROSS JOIN (SELECT [row] = n, tLetter = SUBSTRING(@Target, Tally.n, 1) FROM Tally) t	CROSS APPLY (		SELECT result = (CASE 			WHEN r.col = t.[row] AND rLetter = tLetter THEN 1 -- equal			WHEN rLetter = tLetter THEN (LEN(@Reference)-ABS(r.col-t.[row]))*1.00/				(LEN(@Reference+@Target)/2) -- movement			WHEN r.col = t.[row] AND '_' NOT IN (rLetter,tLetter) THEN 0.1 -- substitution			WHEN r.col = t.[row] THEN 0 -- missing (end of word)			ELSE 0 END) 	) x	WHERE x.result &amp;gt; 0 )SELECT LikenessRatio = SUM(Score)/LEN(@Reference) FROM (	SELECT Score = MAX(best_result) 	FROM Calculator1	GROUP BY col, rLetter) d[/code][/quote]Hey Chris and Alan, I haven't fully dissected Chris's code yet, but how does the "Likeness Ratio" produced by this code compare to the Levenshtein Distance? I've been working on a set-based T-SQL solution to determine the Damerau-Levenshtein Distance between two strings. (DLD distance is just like LD - the number of changes required to transform one string into the other - but DLD allows transpositions (counting as one change) as well as the additions, deletions, and substitutions allowed by LD). My code is not yet working reliably, but it looks very similar to Chris's. If I ever get it working, I'll post it, but it's been about six weeks since my "real" work has allowed me any time to tinker with it. Jason</description><pubDate>Tue, 15 Jan 2013 11:52:31 GMT</pubDate><dc:creator>wolfkillj</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Alan.B (1/14/2013)[/b][hr][quote][b]Jeff Moden (1/10/2013)[/b][hr][quote][b]Alan.B (1/10/2013)[/b][hr][quote]This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done). [/quote]Heh... if that's how you warm up for the day, then you've got me beat by a mile.[/quote]Perhaps, if I got it right ;). I did not get it correct but I certainly found a more interesting SQL excercize than Fizzbuzz. I'm going to keep trying to come up with a 100% set-based Levenshtein unless someone beats me to it.[/quote]Not yet - at least, not in half an hour over lunch. However, I don't think this is far off and is easily modified:[code="sql"]DECLARE @Reference VARCHAR(100) = 'maltesers',  	@Target VARCHAR(100) = 'maltster',	@WordLength INT	---------------------------------------------------------------------------- demonstration version - see how it works--------------------------------------------------------------------------SELECT @WordLength = MAX(WordLength) FROM (SELECT WordLength = DATALENGTH(@Reference) UNION ALL SELECT DATALENGTH(@Target)) dSET @Reference = LEFT(@Reference + REPLICATE('_',@WordLength),@WordLength) SET @Target = LEFT(@Target + REPLICATE('_',@WordLength),@WordLength) ;WITH Tally AS (SELECT TOP(@WordLength) n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 		FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a (n) 		CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b (n)),Calculator1 AS (	SELECT 		t.row, tLetter, 		r.col, rLetter, 		x.result, 		best_result = MAX(x.result) OVER(PARTITION BY r.Col)	FROM ( -- matrix columns from reference word		SELECT [col] = n, rLetter = SUBSTRING(@Reference, Tally.n, 1)  		FROM Tally 	) r	CROSS JOIN ( -- matrix rows from target word		SELECT [row] = n, tLetter = SUBSTRING(@Target, Tally.n, 1) 		FROM Tally 	) t	CROSS APPLY (		SELECT result = (CASE 			WHEN r.col = t.[row] AND rLetter = tLetter THEN 1 -- equal			WHEN rLetter = tLetter THEN (LEN(@Reference)-ABS(r.col-t.[row]))*1.00/				(LEN(@Reference+@Target)/2) -- movement			WHEN r.col = t.[row] AND '_' NOT IN (rLetter,tLetter) THEN 0.1 -- substitution			WHEN r.col = t.[row] THEN 0 -- missing (end of word)			ELSE 0 END) 	) x)SELECT * FROM Calculator1 ORDER BY col,row---------------------------------------------------------------- working version--------------------------------------------------------------;WITH Tally AS (SELECT TOP(@WordLength) n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 		FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a (n) 		CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b (n)),Calculator1 AS (	SELECT 		r.col, rLetter, 		best_result = MAX(x.result) OVER(PARTITION BY r.Col)	FROM (SELECT [col] = n, rLetter = SUBSTRING(@Reference, Tally.n, 1) FROM Tally) r	CROSS JOIN (SELECT [row] = n, tLetter = SUBSTRING(@Target, Tally.n, 1) FROM Tally) t	CROSS APPLY (		SELECT result = (CASE 			WHEN r.col = t.[row] AND rLetter = tLetter THEN 1 -- equal			WHEN rLetter = tLetter THEN (LEN(@Reference)-ABS(r.col-t.[row]))*1.00/				(LEN(@Reference+@Target)/2) -- movement			WHEN r.col = t.[row] AND '_' NOT IN (rLetter,tLetter) THEN 0.1 -- substitution			WHEN r.col = t.[row] THEN 0 -- missing (end of word)			ELSE 0 END) 	) x	WHERE x.result &amp;gt; 0 )SELECT LikenessRatio = SUM(Score)/LEN(@Reference) FROM (	SELECT Score = MAX(best_result) 	FROM Calculator1	GROUP BY col, rLetter) d[/code]</description><pubDate>Tue, 15 Jan 2013 06:29:58 GMT</pubDate><dc:creator>ChrisM@Work</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Jeff Moden (1/14/2013)[/b][hr][quote][b]dwain.c (1/14/2013)[/b][hr]Good grief!  Now I suppose I'm going to need to put up or shut up.[/quote]Heh... glad it's not me this time.  :-P[/quote]Maybe I don't need to because it's already been done.[url]http://www.sqlservercentral.com/articles/Fuzzy+Match/92822/[/url]Seems I'd forgotten about this article, even though I believe I posted something to its discussion thread.</description><pubDate>Mon, 14 Jan 2013 20:33:06 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]dwain.c (1/14/2013)[/b][hr]Good grief!  Now I suppose I'm going to need to put up or shut up.[/quote]Heh... glad it's not me this time.  :-P</description><pubDate>Mon, 14 Jan 2013 20:11:33 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Alan.B (1/14/2013)[/b][hr] That Levenshtein business is a little more complicated than I thought. I am going to keep at it and I'm dying to see what Dwain puts together.  [/quote]Good grief!  Now I suppose I'm going to need to put up or shut up.Unfortunately, for the last couple of days I've been plagued by problems on my laptop (unexpected shutdowns that occur without warning).  So I haven't been able to get much of anything done.</description><pubDate>Mon, 14 Jan 2013 18:20:10 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Alan.B[/b]That Levenshtein business is a little more complicated than I thought. I am going to keep at it and I'm dying to see what Dwain puts together.[/quote]I kind of thought it would be complicated.  Think about this.  My last semester in grad school I took a Python class just for fun, and all the work was done on unix machines.  The professor had this text editor that would open two files, and display the number of differences between the two files.  You could then type some commands, and it would highlight the differences.  Pretty nifty for comparing script files.  I never really thought about how it worked until I saw your post.  Maybe that is like my discovering that the sun comes up in the east, but it was new to me.  I'm eager to see what you come up with.</description><pubDate>Mon, 14 Jan 2013 09:55:56 GMT</pubDate><dc:creator>Greg Snidow</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Jeff Moden (1/10/2013)[/b][hr][quote][b]Alan.B (1/10/2013)[/b][hr][quote]This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done). [/quote]Heh... if that's how you warm up for the day, then you've got me beat by a mile.[/quote]Perhaps, if I got it right ;). I did not get it correct but I certainly found a more interesting SQL excercize than Fizzbuzz. I'm going to keep trying to come up with a 100% set-based Levenshtein unless someone beats me to it.</description><pubDate>Mon, 14 Jan 2013 09:36:16 GMT</pubDate><dc:creator>Alan.B</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Greg Snidow (1/11/2013)[/b][hr][quote][b]Alan.B[/b]I came up with this: [code="sql"]-- strings to compareDECLARE	@s1 varchar(8000)='diner',		@s2 varchar(8000)='dinerr';DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;	DECLARE	@minlen int=CASE WHEN LEN(@s1)&amp;gt;LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;	;WITH 	nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),	matrix  AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n&amp;lt;=@minlen)	SELECT @ld+=COUNT(*) FROM matrix WHERE s1&amp;lt;&amp;gt;s2;LD:SELECT @ld AS LD;[/code][/quote]Alan, I had never heard of this either.  If the Wiki description Dwain posted is to be trusted, I think you might want to re-visit this.  For example, for the two strings you provided, 'diner' and 'dinerr', the output is 1.  This makes sense, because all you have to do is delete the second 'r' in string 2 (or the first 'r' for that matter), and you end up with the same two strings.  Next, let's say I change the 'd' in string 1 to 'a'.  The code returns 2, which I think is correct, since all I have to do is substitute 'a' in string 1 to 'd', then delete one of the 'r' in string 2, and I have two of the same string.  Now, consider this: let's say I insert the 'a' in string 1, but leave the rest intact, leaving me with 'adiner' for string 1, and I leave string 2 as 'dinerr'.  The code returns a value of 5.  This does not make sense, again, if Wiki is to be trusted, as it states the allowable actions are single character insertions, deletions, and substitutions.  So, in order to make 'adiner' = 'dinerr', I only need two actions: delete the 'a' in 'adiner', and delete one of the 'r' in 'dinerr'.  Does that make sense, or am I missing something?  (the latter is entirely possible.) (maybe even likely).[/quote]Sorry for checking back in late... Levenshtrein was (is) new to me too. I did a little research and, yes, you are correct - I did not get it correct. Not only that, I was not able to get it correct (without a loop) after a lot of effort. That Levenshtein business is a little more complicated than I thought. I am going to keep at it and I'm dying to see what Dwain puts together.  Nonetheless - I was successful at creating a purely set-based version of [i]what I thought[/i] the would produce the Levenshtein Distance; and did so using a tally table :). The goal was to improve my tally table skills more than anything. If you feed my function 2 strings of equally length it will provide you with the Hamming Distance (not as big of a deal).</description><pubDate>Mon, 14 Jan 2013 09:11:14 GMT</pubDate><dc:creator>Alan.B</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]dwain.c[/b]Looks like you may have beaten me to it.[/quote]Oh, don't worry about that, Dwain.  Admittedly, I did look at it thinking it was only a minor tweaking of Alan's code, but I think it may be more complicated than that when you really ponder on it.</description><pubDate>Sat, 12 Jan 2013 04:22:19 GMT</pubDate><dc:creator>Greg Snidow</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Greg Snidow (1/11/2013)[/b][hr][quote][b]dwain.c[/b]I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well![/quote]I guess you're really busy Dwain (is that the Jeopardy theme in the background?), since you've not come back with anything yet.[/quote]You betcha!  That crazy Vertex Covers puzzle has me losing sleep! :w00t:Looks like you may have beaten me to it.</description><pubDate>Fri, 11 Jan 2013 19:02:09 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Alan.B[/b]I came up with this: [code="sql"]-- strings to compareDECLARE	@s1 varchar(8000)='diner',		@s2 varchar(8000)='dinerr';DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;	DECLARE	@minlen int=CASE WHEN LEN(@s1)&amp;gt;LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;	;WITH 	nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),	matrix  AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n&amp;lt;=@minlen)	SELECT @ld+=COUNT(*) FROM matrix WHERE s1&amp;lt;&amp;gt;s2;LD:SELECT @ld AS LD;[/code][/quote]Alan, I had never heard of this either.  If the Wiki description Dwain posted is to be trusted, I think you might want to re-visit this.  For example, for the two strings you provided, 'diner' and 'dinerr', the output is 1.  This makes sense, because all you have to do is delete the second 'r' in string 2 (or the first 'r' for that matter), and you end up with the same two strings.  Next, let's say I change the 'd' in string 1 to 'a'.  The code returns 2, which I think is correct, since all I have to do is substitute 'a' in string 1 to 'd', then delete one of the 'r' in string 2, and I have two of the same string.  Now, consider this: let's say I insert the 'a' in string 1, but leave the rest intact, leaving me with 'adiner' for string 1, and I leave string 2 as 'dinerr'.  The code returns a value of 5.  This does not make sense, again, if Wiki is to be trusted, as it states the allowable actions are single character insertions, deletions, and substitutions.  So, in order to make 'adiner' = 'dinerr', I only need two actions: delete the 'a' in 'adiner', and delete one of the 'r' in 'dinerr'.  Does that make sense, or am I missing something?  (the latter is entirely possible.) (maybe even likely).</description><pubDate>Fri, 11 Jan 2013 14:50:12 GMT</pubDate><dc:creator>Greg Snidow</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]dwain.c[/b]I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well![/quote]I guess you're really busy Dwain (is that the Jeopardy theme in the background?), since you've not come back with anything yet.</description><pubDate>Fri, 11 Jan 2013 14:22:28 GMT</pubDate><dc:creator>Greg Snidow</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Jeff Moden (1/10/2013)[/b][hr][quote][b]Alan.B (1/10/2013)[/b][hr][quote]This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done). [/quote]Heh... if that's how you warm up for the day, then you've got me beat by a mile.You can just bet I'm going to do a deep dive on your rendition of this famous problem... especially since you did it with a Tally Table.  Thanks for doing it and thanks for posting it.[/quote]I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well! :-D[url]http://en.wikipedia.org/wiki/Levenshtein_distance[/url]</description><pubDate>Thu, 10 Jan 2013 17:31:20 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Alan.B (1/10/2013)[/b][hr][quote]This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done). [/quote]Heh... if that's how you warm up for the day, then you've got me beat by a mile.You can just bet I'm going to do a deep dive on your rendition of this famous problem... especially since you did it with a Tally Table.  Thanks for doing it and thanks for posting it.</description><pubDate>Thu, 10 Jan 2013 16:19:09 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Alan.B [/b] (just for fun because it's something I've never seen done[/quote]Here, here brother.  That's what its all about.  Well, a lot of it anyway.</description><pubDate>Thu, 10 Jan 2013 15:26:37 GMT</pubDate><dc:creator>Greg Snidow</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Sean Lange (1/10/2013)[/b][hr][quote][b]Alan.B (1/10/2013)[/b][hr]I agree that I should have used a tally table. I've been writing CTE's for counting for awhile and can do so while sleeping. I still fumble around with the tally table and, in this case was in a hury to post my code. I just updated my query to include the tally table as you showed above:  303 reads is now 146 reads. :-)[/quote]You might want to take a look at this article.[url=http://www.sqlservercentral.com/articles/T-SQL/74118/]http://www.sqlservercentral.com/articles/T-SQL/74118/[/url][/quote]Thank you Sean. Yes, I have a few times - it's in my favorites folder and I have been putting what I have learned to use. You won't see any more counting CTE's from me. This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done). I came up with this: [code="sql"]-- strings to compareDECLARE	@s1 varchar(8000)='diner',		@s2 varchar(8000)='dinerr';DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;	DECLARE	@minlen int=CASE WHEN LEN(@s1)&amp;gt;LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;	;WITH 	nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),	matrix  AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n&amp;lt;=@minlen)	SELECT @ld+=COUNT(*) FROM matrix WHERE s1&amp;lt;&amp;gt;s2;LD:SELECT @ld AS LD;[/code]I actually posted this as a script and hope it gets approved. The Tally table is new to me, I did not get it at first but now I totally understand what all the hype is about. Long live the Tally table!Edit: typo.</description><pubDate>Thu, 10 Jan 2013 13:32:06 GMT</pubDate><dc:creator>Alan.B</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Alan.B (1/10/2013)[/b][hr]I agree that I should have used a tally table. I've been writing CTE's for counting for awhile and can do so while sleeping. I still fumble around with the tally table and, in this case was in a hury to post my code. I just updated my query to include the tally table as you showed above:  303 reads is now 146 reads. :-)[/quote]You might want to take a look at this article.[url=http://www.sqlservercentral.com/articles/T-SQL/74118/]http://www.sqlservercentral.com/articles/T-SQL/74118/[/url]</description><pubDate>Thu, 10 Jan 2013 12:19:50 GMT</pubDate><dc:creator>Sean Lange</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Jeff Moden (1/9/2013)[/b][hr][quote][b]Alan.B (1/9/2013)[/b][hr][quote][b]AndrewSQLDBA (1/9/2013)[/b][hr]Its worth noting that the following is sargable. LIKE '%abc%' [/quote]Did you mean "NOT" SARGable because it sure doesn't look SARGable from here. ;-)  And, I agree... nice use of NTILE.[/quote]Thank you very much Jeff. One year ago I had never heard of windows functions, set-based SQL, or a tally table. I think I'm starting to get it and hope to someday be able to write queries like you, Dwain, Lynn and others on SSC. Yes - I meant NOT SARGable.</description><pubDate>Thu, 10 Jan 2013 12:19:20 GMT</pubDate><dc:creator>Alan.B</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]dwain.c (1/9/2013)[/b][hr]Alan - That's a very intriguing use of NTILE!I would recommend, however that you change the way you construct your asciichar table:[code="sql"];WITH asciichar(n, c) AS (SELECT n=64+number, CHAR(64+number)FROM [master].dbo.spt_values TallyWHERE [Type] = 'P' AND Number BETWEEN 1 AND 26)SELECT *FROM asciichar[/code][/quote]Thank you, and thank you. I'm still just learning Windows functions (have been studying them for ~6 months now give or take.) For me it's been easy to find a uses for ROW_NUMBER , RANK and DENSE_RANK but this was a rare case where I found some use for NTILE; this was originally a dynamic SQL query. I agree that I should have used a tally table. I've been writing CTE's for counting for awhile and can do so while sleeping. I still fumble around with the tally table and, in this case was in a hury to post my code. I just updated my query to include the tally table as you showed above:  303 reads is now 146 reads. :-)</description><pubDate>Thu, 10 Jan 2013 11:59:02 GMT</pubDate><dc:creator>Alan.B</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>Yes.It would also give the wrong answer, IMHO.</description><pubDate>Wed, 09 Jan 2013 21:58:27 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Jeff Moden (1/9/2013)[/b][hr][quote][b]Alan.B (1/9/2013)[/b][hr][quote][b]AndrewSQLDBA (1/9/2013)[/b][hr]Its worth noting that the following is sargable. LIKE '%abc%' [/quote]Did you mean "NOT" SARGable because it sure doesn't look SARGable from here. ;-)[/quote]Is that because of the % at the beginning of the string?</description><pubDate>Wed, 09 Jan 2013 21:57:24 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]Alan.B (1/9/2013)[/b][hr][quote][b]AndrewSQLDBA (1/9/2013)[/b][hr]Its worth noting that the following is sargable. LIKE '%abc%' [/quote]Did you mean "NOT" SARGable because it sure doesn't look SARGable from here. ;-)  And, I agree... nice use of NTILE.</description><pubDate>Wed, 09 Jan 2013 21:54:54 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>Alan - That's a very intriguing use of NTILE!I would recommend, however that you change the way you construct your asciichar table:[code="sql"];WITH asciichar(n, c) AS (SELECT n=64+number, CHAR(64+number)FROM [master].dbo.spt_values TallyWHERE [Type] = 'P' AND Number BETWEEN 1 AND 26)SELECT *FROM asciichar[/code]</description><pubDate>Wed, 09 Jan 2013 18:04:57 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>[quote][b]AndrewSQLDBA (1/9/2013)[/b][hr]Hello EveryoneI hope that everyone is having a very good day.I need to write a query to select row counts, but this seems like a lot of scanning of the rows. Is there a better way to write a query that does this? There are approx 5 million rows in the table[code="sql"]DECLARE @abc intDECLARE @def intDECLARE @ghi intDECLARE @jkl intDECLARE @mno intDECLARE @pqr intDECLARE @stu intDECLARE @vw intDECLARE @xzy intSET @abc = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%abc%')SET @def = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%def%')SET @ghi = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%ghi%')SET @jkl = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%jkl%')SET @mno = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%mno%')SET @pqr = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%pqr%')SET @stu = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%stu%')SET @vw = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%vw%')SET @xzy = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%xyz%')[/code][/quote]Its worth noting that the following is sargable. LIKE '%abc%' Question: Are you really trying to find columnNames with abc in their name, def in their name, etc... Or are you trying to get a count of all columns whose name begins with [a-c] for @abc; a count of names beginning with [d-f] for @def, etc, etc... If that's what you are looking for, you can try the query below. It's a way to pass a variable or parameter to NTILE and do what you are doing dynamically (for the most part, some assembly required). [code="sql"]DECLARE @groups int = 9;;WITH asciichar(n,c) AS (	SELECT (65), CHAR(65)	UNION ALL	SELECT n+1, CHAR(n+1) FROM asciichar WHERE n+1&amp;lt;=90),groups AS (	SELECT	NTILE(@groups) OVER (ORDER BY n) AS groupid, * FROM asciichar  ),ntileMatrix AS(	SELECT	groupid,			MIN(c) OVER (PARTITION BY groupid)+'-'+ 			MAX(c) OVER (PARTITION BY groupid) AS [group],			n AS [ASCII],			c AS [CHAR]	FROM groups  ),people AS(	SELECT	LEFT(LastName,1) AS c1, LastName+', '+FirstName AS Name	FROM AdventureWorks2008R2.person.person  )SELECT  nm.[group], p.NameFROM ntileMatrix nm CROSS JOIN people pWHERE [CHAR]=c1[/code]The code above will produce this (truncated for reading):[code="plain"]group Name----- -------------------------------------A-C   Abbas, SyedA-C   Abel, CatherineA-C   Abercrombie, Kim....Y-Z   Zukowski, JakeY-Z   Zwilling, MichaelY-Z   Zwilling, Michael[/code]You can change it to this: [code="sql"]DECLARE @groups int = 9;;WITH asciichar(n,c) AS (	SELECT (65), CHAR(65)	UNION ALL	SELECT n+1, CHAR(n+1) FROM asciichar WHERE n+1&amp;lt;=90),groups AS (	SELECT	NTILE(@groups) OVER (ORDER BY n) AS groupid, * FROM asciichar  ),ntileMatrix AS(	SELECT	groupid,			MIN(c) OVER (PARTITION BY groupid)+'-'+ 			MAX(c) OVER (PARTITION BY groupid) AS [group],			n AS [ASCII],			c AS [CHAR]	FROM groups  ),people AS(	SELECT	LEFT(LastName,1) AS c1, LastName+', '+FirstName AS Name	FROM AdventureWorks2008R2.person.person  ),Totals AS(	SELECT  nm.[group], p.Name	FROM ntileMatrix nm 	CROSS JOIN people p	WHERE [CHAR]=c1 )SELECT [group], COUNT([group]) AS groupCountFROM TotalsGROUP BY [group][/code]To get this: [code="plain"]group groupCount----- -----------A-C   3502D-F   1111G-I   2572J-L   2347M-O   2061P-R   3195S-U   2800V-X   1552Y-Z   832[/code]Again, some assembly required for what you are doing.... What's cool is you can change the parameter or variable to dynamically change your groups like so:[code="sql"]DECLARE @groups int = 3;[/code]and get this: [code="plain"]group groupCount----- -----------A-I   7185J-R   7603S-Z   5184[/code]Edit: Typo's</description><pubDate>Wed, 09 Jan 2013 15:44:37 GMT</pubDate><dc:creator>Alan.B</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>Thank You PeterVery handy to know.Andrew SQLDBA</description><pubDate>Wed, 09 Jan 2013 13:30:57 GMT</pubDate><dc:creator>AndrewSQLDBA</dc:creator></item><item><title>RE: Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>You can try something like this (requires only a single table scan):[code="sql"]SELECT 	@abc = COUNT(CASE WHEN ColumnName LIKE '%abc%' THEN 1 END),	@def = COUNT(CASE WHEN ColumnName LIKE '%def%' THEN 1 END),	@ghi = COUNT(CASE WHEN ColumnName LIKE '%ghi%' THEN 1 END),	@jkl = COUNT(CASE WHEN ColumnName LIKE '%jkl%' THEN 1 END),	@mno = COUNT(CASE WHEN ColumnName LIKE '%mno%' THEN 1 END),	@pqr = COUNT(CASE WHEN ColumnName LIKE '%pqr%' THEN 1 END),	@stu = COUNT(CASE WHEN ColumnName LIKE '%stu%' THEN 1 END),	@vw = COUNT(CASE WHEN ColumnName LIKE '%vw%' THEN 1 END),	@xzy = COUNT(CASE WHEN ColumnName LIKE '%xyz%' THEN 1 END) FROM 	TableName[/code]</description><pubDate>Wed, 09 Jan 2013 11:04:43 GMT</pubDate><dc:creator>Peter Brinkhaus</dc:creator></item><item><title>Better Way to Perform this Query</title><link>http://www.sqlservercentral.com/Forums/Topic1404891-392-1.aspx</link><description>Hello EveryoneI hope that everyone is having a very good day.I need to write a query to select row counts, but this seems like a lot of scanning of the rows. Is there a better way to write a query that does this? There are approx 5 million rows in the table[code="sql"]DECLARE @abc intDECLARE @def intDECLARE @ghi intDECLARE @jkl intDECLARE @mno intDECLARE @pqr intDECLARE @stu intDECLARE @vw intDECLARE @xzy intSET @abc = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%abc%')SET @def = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%def%')SET @ghi = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%ghi%')SET @jkl = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%jkl%')SET @mno = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%mno%')SET @pqr = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%pqr%')SET @stu = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%stu%')SET @vw = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%vw%')SET @xzy = (SELECT COUNT(*) FROM TableName WHERE ColumnName LIKE '%xyz%')[/code]Thank You in advance for all your help, advice and suggestions.Andrew SQLDBA</description><pubDate>Wed, 09 Jan 2013 10:26:50 GMT</pubDate><dc:creator>AndrewSQLDBA</dc:creator></item></channel></rss>