SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Better Way to Perform this Query


Better Way to Perform this Query

Author
Message
Sean Lange
Sean Lange
SSC Guru
SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)

Group: General Forum Members
Points: 62903 Visits: 17959
Alan.B (1/10/2013)

I agree that I should have used a tally table. I've been writing CTE's for counting for awhile and can do so while sleeping. I still fumble around with the tally table and, in this case was in a hury to post my code. I just updated my query to include the tally table as you showed above: 303 reads is now 146 reads. :-)


You might want to take a look at this article.

http://www.sqlservercentral.com/articles/T-SQL/74118/

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Alan.B
Alan.B
SSChampion
SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)

Group: General Forum Members
Points: 13456 Visits: 8002
Sean Lange (1/10/2013)
Alan.B (1/10/2013)

I agree that I should have used a tally table. I've been writing CTE's for counting for awhile and can do so while sleeping. I still fumble around with the tally table and, in this case was in a hury to post my code. I just updated my query to include the tally table as you showed above: 303 reads is now 146 reads. :-)


You might want to take a look at this article.

http://www.sqlservercentral.com/articles/T-SQL/74118/


Thank you Sean. Yes, I have a few times - it's in my favorites folder and I have been putting what I have learned to use. You won't see any more counting CTE's from me. This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done).

I came up with this:


-- strings to compare
DECLARE @s1 varchar(8000)='diner',
@s2 varchar(8000)='dinerr';

DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));

IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;

DECLARE @minlen int=CASE WHEN LEN(@s1)>LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;

;WITH
nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),
matrix AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n<=@minlen)
SELECT @ld+=COUNT(*) FROM matrix WHERE s1<>s2;
LD:
SELECT @ld AS LD;



I actually posted this as a script and hope it gets approved. The Tally table is new to me, I did not get it at first but now I totally understand what all the hype is about. Long live the Tally table!

Edit: typo.

-- Alan Burstein



Best practices for getting help on SQLServerCentral
Need to split a string? Try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL 2012+)
Need a pattern-based splitter? Try PatternSplitCM
Need to remove or replace those unwanted characters? Try PatExclude8K and PatReplace8K.

"I can't stress enough the importance of switching from a 'sequential files' mindset to 'set-based' thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code. " -- Itzek Ben-Gan 2001
Greg Snidow
Greg Snidow
SSCarpal Tunnel
SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)

Group: General Forum Members
Points: 4353 Visits: 2494
Alan.B
(just for fun because it's something I've never seen done


Here, here brother. That's what its all about. Well, a lot of it anyway.

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)

Group: General Forum Members
Points: 216126 Visits: 41986
Alan.B (1/10/2013)
[quote]This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done).


Heh... if that's how you warm up for the day, then you've got me beat by a mile.

You can just bet I'm going to do a deep dive on your rendition of this famous problem... especially since you did it with a Tally Table. Thanks for doing it and thanks for posting it.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
dwain.c
dwain.c
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17915 Visits: 6431
Jeff Moden (1/10/2013)
Alan.B (1/10/2013)
[quote]This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done).


Heh... if that's how you warm up for the day, then you've got me beat by a mile.

You can just bet I'm going to do a deep dive on your rendition of this famous problem... especially since you did it with a Tally Table. Thanks for doing it and thanks for posting it.


I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well! :-D

http://en.wikipedia.org/wiki/Levenshtein_distance


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Greg Snidow
Greg Snidow
SSCarpal Tunnel
SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)

Group: General Forum Members
Points: 4353 Visits: 2494
dwain.c
I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well!


I guess you're really busy Dwain (is that the Jeopardy theme in the background?), since you've not come back with anything yet.

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
Greg Snidow
Greg Snidow
SSCarpal Tunnel
SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)

Group: General Forum Members
Points: 4353 Visits: 2494
Alan.B
I came up with this:


-- strings to compare
DECLARE @s1 varchar(8000)='diner',
@s2 varchar(8000)='dinerr';

DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));

IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;

DECLARE @minlen int=CASE WHEN LEN(@s1)>LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;

;WITH
nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),
matrix AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n<=@minlen)
SELECT @ld+=COUNT(*) FROM matrix WHERE s1<>s2;
LD:
SELECT @ld AS LD;




Alan, I had never heard of this either. If the Wiki description Dwain posted is to be trusted, I think you might want to re-visit this. For example, for the two strings you provided, 'diner' and 'dinerr', the output is 1. This makes sense, because all you have to do is delete the second 'r' in string 2 (or the first 'r' for that matter), and you end up with the same two strings. Next, let's say I change the 'd' in string 1 to 'a'. The code returns 2, which I think is correct, since all I have to do is substitute 'a' in string 1 to 'd', then delete one of the 'r' in string 2, and I have two of the same string. Now, consider this: let's say I insert the 'a' in string 1, but leave the rest intact, leaving me with 'adiner' for string 1, and I leave string 2 as 'dinerr'. The code returns a value of 5. This does not make sense, again, if Wiki is to be trusted, as it states the allowable actions are single character insertions, deletions, and substitutions. So, in order to make 'adiner' = 'dinerr', I only need two actions: delete the 'a' in 'adiner', and delete one of the 'r' in 'dinerr'. Does that make sense, or am I missing something? (the latter is entirely possible.) (maybe even likely).

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
dwain.c
dwain.c
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17915 Visits: 6431
Greg Snidow (1/11/2013)
dwain.c
I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well!


I guess you're really busy Dwain (is that the Jeopardy theme in the background?), since you've not come back with anything yet.


You betcha! That crazy Vertex Covers puzzle has me losing sleep! w00t

Looks like you may have beaten me to it.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Greg Snidow
Greg Snidow
SSCarpal Tunnel
SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)

Group: General Forum Members
Points: 4353 Visits: 2494
dwain.c
Looks like you may have beaten me to it.


Oh, don't worry about that, Dwain. Admittedly, I did look at it thinking it was only a minor tweaking of Alan's code, but I think it may be more complicated than that when you really ponder on it.

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
Alan.B
Alan.B
SSChampion
SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)

Group: General Forum Members
Points: 13456 Visits: 8002
Greg Snidow (1/11/2013)
Alan.B
I came up with this:


-- strings to compare
DECLARE @s1 varchar(8000)='diner',
@s2 varchar(8000)='dinerr';

DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));

IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;

DECLARE @minlen int=CASE WHEN LEN(@s1)>LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;

;WITH
nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),
matrix AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n<=@minlen)
SELECT @ld+=COUNT(*) FROM matrix WHERE s1<>s2;
LD:
SELECT @ld AS LD;




Alan, I had never heard of this either. If the Wiki description Dwain posted is to be trusted, I think you might want to re-visit this. For example, for the two strings you provided, 'diner' and 'dinerr', the output is 1. This makes sense, because all you have to do is delete the second 'r' in string 2 (or the first 'r' for that matter), and you end up with the same two strings. Next, let's say I change the 'd' in string 1 to 'a'. The code returns 2, which I think is correct, since all I have to do is substitute 'a' in string 1 to 'd', then delete one of the 'r' in string 2, and I have two of the same string. Now, consider this: let's say I insert the 'a' in string 1, but leave the rest intact, leaving me with 'adiner' for string 1, and I leave string 2 as 'dinerr'. The code returns a value of 5. This does not make sense, again, if Wiki is to be trusted, as it states the allowable actions are single character insertions, deletions, and substitutions. So, in order to make 'adiner' = 'dinerr', I only need two actions: delete the 'a' in 'adiner', and delete one of the 'r' in 'dinerr'. Does that make sense, or am I missing something? (the latter is entirely possible.) (maybe even likely).


Sorry for checking back in late...

Levenshtrein was (is) new to me too. I did a little research and, yes, you are correct - I did not get it correct. Not only that, I was not able to get it correct (without a loop) after a lot of effort. That Levenshtein business is a little more complicated than I thought. I am going to keep at it and I'm dying to see what Dwain puts together.

Nonetheless - I was successful at creating a purely set-based version of what I thought the would produce the Levenshtein Distance; and did so using a tally table Smile. The goal was to improve my tally table skills more than anything.

If you feed my function 2 strings of equally length it will provide you with the Hamming Distance (not as big of a deal).

-- Alan Burstein



Best practices for getting help on SQLServerCentral
Need to split a string? Try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL 2012+)
Need a pattern-based splitter? Try PatternSplitCM
Need to remove or replace those unwanted characters? Try PatExclude8K and PatReplace8K.

"I can't stress enough the importance of switching from a 'sequential files' mindset to 'set-based' thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code. " -- Itzek Ben-Gan 2001
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search