SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Better Way to Perform this Query


Better Way to Perform this Query

Author
Message
Sean Lange
Sean Lange
SSC Guru
SSC Guru (147K reputation)SSC Guru (147K reputation)SSC Guru (147K reputation)SSC Guru (147K reputation)SSC Guru (147K reputation)SSC Guru (147K reputation)SSC Guru (147K reputation)SSC Guru (147K reputation)

Group: General Forum Members
Points: 147800 Visits: 18567
Alan.B (1/10/2013)

I agree that I should have used a tally table. I've been writing CTE's for counting for awhile and can do so while sleeping. I still fumble around with the tally table and, in this case was in a hury to post my code. I just updated my query to include the tally table as you showed above: 303 reads is now 146 reads. :-)


You might want to take a look at this article.

http://www.sqlservercentral.com/articles/T-SQL/74118/

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Alan Burstein
Alan Burstein
SSC-Dedicated
SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)

Group: General Forum Members
Points: 32263 Visits: 8574
Sean Lange (1/10/2013)
Alan.B (1/10/2013)

I agree that I should have used a tally table. I've been writing CTE's for counting for awhile and can do so while sleeping. I still fumble around with the tally table and, in this case was in a hury to post my code. I just updated my query to include the tally table as you showed above: 303 reads is now 146 reads. :-)


You might want to take a look at this article.

http://www.sqlservercentral.com/articles/T-SQL/74118/


Thank you Sean. Yes, I have a few times - it's in my favorites folder and I have been putting what I have learned to use. You won't see any more counting CTE's from me. This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done).

I came up with this:


-- strings to compare
DECLARE @s1 varchar(8000)='diner',
@s2 varchar(8000)='dinerr';

DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));

IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;

DECLARE @minlen int=CASE WHEN LEN(@s1)>LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;

;WITH
nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),
matrix AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n<=@minlen)
SELECT @ld+=COUNT(*) FROM matrix WHERE s1<>s2;
LD:
SELECT @ld AS LD;



I actually posted this as a script and hope it gets approved. The Tally table is new to me, I did not get it at first but now I totally understand what all the hype is about. Long live the Tally table!

Edit: typo.

-- Alan Burstein


Helpful links:

Best practices for getting help on SQLServerCentral -- Jeff Moden
How to Post Performance Problems -- Gail Shaw

Nasty fast set-based string manipulation functions:
For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)
To split strings based on patterns try PatternSplitCM
Need to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8K

I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code. -- Itzik Ben-Gan 2001

Greg Snidow
Greg Snidow
SSCrazy Eights
SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)

Group: General Forum Members
Points: 9245 Visits: 2498
Alan.B
(just for fun because it's something I've never seen done


Here, here brother. That's what its all about. Well, a lot of it anyway.

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)SSC Guru (505K reputation)

Group: General Forum Members
Points: 505228 Visits: 44247
Alan.B (1/10/2013)
[quote]This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done).


Heh... if that's how you warm up for the day, then you've got me beat by a mile.

You can just bet I'm going to do a deep dive on your rendition of this famous problem... especially since you did it with a Tally Table. Thanks for doing it and thanks for posting it.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
dwain.c
dwain.c
SSC-Forever
SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)

Group: General Forum Members
Points: 43925 Visits: 6431
Jeff Moden (1/10/2013)
Alan.B (1/10/2013)
[quote]This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done).


Heh... if that's how you warm up for the day, then you've got me beat by a mile.

You can just bet I'm going to do a deep dive on your rendition of this famous problem... especially since you did it with a Tally Table. Thanks for doing it and thanks for posting it.


I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well! :-D

http://en.wikipedia.org/wiki/Levenshtein_distance


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Greg Snidow
Greg Snidow
SSCrazy Eights
SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)

Group: General Forum Members
Points: 9245 Visits: 2498
dwain.c
I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well!


I guess you're really busy Dwain (is that the Jeopardy theme in the background?), since you've not come back with anything yet.

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
Greg Snidow
Greg Snidow
SSCrazy Eights
SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)

Group: General Forum Members
Points: 9245 Visits: 2498
Alan.B
I came up with this:


-- strings to compare
DECLARE @s1 varchar(8000)='diner',
@s2 varchar(8000)='dinerr';

DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));

IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;

DECLARE @minlen int=CASE WHEN LEN(@s1)>LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;

;WITH
nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),
matrix AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n<=@minlen)
SELECT @ld+=COUNT(*) FROM matrix WHERE s1<>s2;
LD:
SELECT @ld AS LD;




Alan, I had never heard of this either. If the Wiki description Dwain posted is to be trusted, I think you might want to re-visit this. For example, for the two strings you provided, 'diner' and 'dinerr', the output is 1. This makes sense, because all you have to do is delete the second 'r' in string 2 (or the first 'r' for that matter), and you end up with the same two strings. Next, let's say I change the 'd' in string 1 to 'a'. The code returns 2, which I think is correct, since all I have to do is substitute 'a' in string 1 to 'd', then delete one of the 'r' in string 2, and I have two of the same string. Now, consider this: let's say I insert the 'a' in string 1, but leave the rest intact, leaving me with 'adiner' for string 1, and I leave string 2 as 'dinerr'. The code returns a value of 5. This does not make sense, again, if Wiki is to be trusted, as it states the allowable actions are single character insertions, deletions, and substitutions. So, in order to make 'adiner' = 'dinerr', I only need two actions: delete the 'a' in 'adiner', and delete one of the 'r' in 'dinerr'. Does that make sense, or am I missing something? (the latter is entirely possible.) (maybe even likely).

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
dwain.c
dwain.c
SSC-Forever
SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)

Group: General Forum Members
Points: 43925 Visits: 6431
Greg Snidow (1/11/2013)
dwain.c
I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well!


I guess you're really busy Dwain (is that the Jeopardy theme in the background?), since you've not come back with anything yet.


You betcha! That crazy Vertex Covers puzzle has me losing sleep! w00t

Looks like you may have beaten me to it.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Greg Snidow
Greg Snidow
SSCrazy Eights
SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)

Group: General Forum Members
Points: 9245 Visits: 2498
dwain.c
Looks like you may have beaten me to it.


Oh, don't worry about that, Dwain. Admittedly, I did look at it thinking it was only a minor tweaking of Alan's code, but I think it may be more complicated than that when you really ponder on it.

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
Alan Burstein
Alan Burstein
SSC-Dedicated
SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)

Group: General Forum Members
Points: 32263 Visits: 8574
Greg Snidow (1/11/2013)
Alan.B
I came up with this:


-- strings to compare
DECLARE @s1 varchar(8000)='diner',
@s2 varchar(8000)='dinerr';

DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));

IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;

DECLARE @minlen int=CASE WHEN LEN(@s1)>LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;

;WITH
nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),
matrix AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n<=@minlen)
SELECT @ld+=COUNT(*) FROM matrix WHERE s1<>s2;
LD:
SELECT @ld AS LD;




Alan, I had never heard of this either. If the Wiki description Dwain posted is to be trusted, I think you might want to re-visit this. For example, for the two strings you provided, 'diner' and 'dinerr', the output is 1. This makes sense, because all you have to do is delete the second 'r' in string 2 (or the first 'r' for that matter), and you end up with the same two strings. Next, let's say I change the 'd' in string 1 to 'a'. The code returns 2, which I think is correct, since all I have to do is substitute 'a' in string 1 to 'd', then delete one of the 'r' in string 2, and I have two of the same string. Now, consider this: let's say I insert the 'a' in string 1, but leave the rest intact, leaving me with 'adiner' for string 1, and I leave string 2 as 'dinerr'. The code returns a value of 5. This does not make sense, again, if Wiki is to be trusted, as it states the allowable actions are single character insertions, deletions, and substitutions. So, in order to make 'adiner' = 'dinerr', I only need two actions: delete the 'a' in 'adiner', and delete one of the 'r' in 'dinerr'. Does that make sense, or am I missing something? (the latter is entirely possible.) (maybe even likely).


Sorry for checking back in late...

Levenshtrein was (is) new to me too. I did a little research and, yes, you are correct - I did not get it correct. Not only that, I was not able to get it correct (without a loop) after a lot of effort. That Levenshtein business is a little more complicated than I thought. I am going to keep at it and I'm dying to see what Dwain puts together.

Nonetheless - I was successful at creating a purely set-based version of what I thought the would produce the Levenshtein Distance; and did so using a tally table Smile. The goal was to improve my tally table skills more than anything.

If you feed my function 2 strings of equally length it will provide you with the Hamming Distance (not as big of a deal).

-- Alan Burstein


Helpful links:

Best practices for getting help on SQLServerCentral -- Jeff Moden
How to Post Performance Problems -- Gail Shaw

Nasty fast set-based string manipulation functions:
For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)
To split strings based on patterns try PatternSplitCM
Need to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8K

I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code. -- Itzik Ben-Gan 2001

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search