Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

Better Way to Perform this Query Expand / Collapse
Author
Message
Posted Thursday, January 10, 2013 12:19 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 3:13 PM
Points: 12,995, Visits: 12,414
Alan.B (1/10/2013)

I agree that I should have used a tally table. I've been writing CTE's for counting for awhile and can do so while sleeping. I still fumble around with the tally table and, in this case was in a hury to post my code. I just updated my query to include the tally table as you showed above: 303 reads is now 146 reads.


You might want to take a look at this article.

http://www.sqlservercentral.com/articles/T-SQL/74118/


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1405597
Posted Thursday, January 10, 2013 1:32 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 3:49 PM
Points: 589, Visits: 2,750
Sean Lange (1/10/2013)
Alan.B (1/10/2013)

I agree that I should have used a tally table. I've been writing CTE's for counting for awhile and can do so while sleeping. I still fumble around with the tally table and, in this case was in a hury to post my code. I just updated my query to include the tally table as you showed above: 303 reads is now 146 reads.


You might want to take a look at this article.

http://www.sqlservercentral.com/articles/T-SQL/74118/


Thank you Sean. Yes, I have a few times - it's in my favorites folder and I have been putting what I have learned to use. You won't see any more counting CTE's from me. This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done).

I came up with this:

-- strings to compare
DECLARE @s1 varchar(8000)='diner',
@s2 varchar(8000)='dinerr';

DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));

IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;

DECLARE @minlen int=CASE WHEN LEN(@s1)>LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;

;WITH
nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),
matrix AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n<=@minlen)
SELECT @ld+=COUNT(*) FROM matrix WHERE s1<>s2;
LD:
SELECT @ld AS LD;

I actually posted this as a script and hope it gets approved. The Tally table is new to me, I did not get it at first but now I totally understand what all the hype is about. Long live the Tally table!

Edit: typo.


-- Alan Burstein



Read this article for best practices on asking questions.
Need to split a string? Try this (Jeff Moden)
Need a pattern-based string spitter? Try this (Dwain Camps)
My blog
Post #1405630
Posted Thursday, January 10, 2013 3:26 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 1:06 PM
Points: 1,563, Visits: 2,392
Alan.B
(just for fun because it's something I've never seen done


Here, here brother. That's what its all about. Well, a lot of it anyway.


Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
Post #1405682
Posted Thursday, January 10, 2013 4:19 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 11:28 PM
Points: 35,263, Visits: 31,750
Alan.B (1/10/2013)
[quote]This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done).


Heh... if that's how you warm up for the day, then you've got me beat by a mile.

You can just bet I'm going to do a deep dive on your rendition of this famous problem... especially since you did it with a Tally Table. Thanks for doing it and thanks for posting it.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1405692
Posted Thursday, January 10, 2013 5:31 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 9:07 PM
Points: 3,420, Visits: 5,347
Jeff Moden (1/10/2013)
Alan.B (1/10/2013)
[quote]This morning before work for example, after a lot of effort, I finally figured out how to get the Levenshtein Edit Distance between 2 strings without a loop (just for fun because it's something I've never seen done).


Heh... if that's how you warm up for the day, then you've got me beat by a mile.

You can just bet I'm going to do a deep dive on your rendition of this famous problem... especially since you did it with a Tally Table. Thanks for doing it and thanks for posting it.


I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well!

http://en.wikipedia.org/wiki/Levenshtein_distance



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1405705
Posted Friday, January 11, 2013 2:22 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 1:06 PM
Points: 1,563, Visits: 2,392
dwain.c
I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well!


I guess you're really busy Dwain (is that the Jeopardy theme in the background?), since you've not come back with anything yet.


Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
Post #1406263
Posted Friday, January 11, 2013 2:50 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 1:06 PM
Points: 1,563, Visits: 2,392
Alan.B
I came up with this:

-- strings to compare
DECLARE @s1 varchar(8000)='diner',
@s2 varchar(8000)='dinerr';

DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));

IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;

DECLARE @minlen int=CASE WHEN LEN(@s1)>LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;

;WITH
nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),
matrix AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n<=@minlen)
SELECT @ld+=COUNT(*) FROM matrix WHERE s1<>s2;
LD:
SELECT @ld AS LD;



Alan, I had never heard of this either. If the Wiki description Dwain posted is to be trusted, I think you might want to re-visit this. For example, for the two strings you provided, 'diner' and 'dinerr', the output is 1. This makes sense, because all you have to do is delete the second 'r' in string 2 (or the first 'r' for that matter), and you end up with the same two strings. Next, let's say I change the 'd' in string 1 to 'a'. The code returns 2, which I think is correct, since all I have to do is substitute 'a' in string 1 to 'd', then delete one of the 'r' in string 2, and I have two of the same string. Now, consider this: let's say I insert the 'a' in string 1, but leave the rest intact, leaving me with 'adiner' for string 1, and I leave string 2 as 'dinerr'. The code returns a value of 5. This does not make sense, again, if Wiki is to be trusted, as it states the allowable actions are single character insertions, deletions, and substitutions. So, in order to make 'adiner' = 'dinerr', I only need two actions: delete the 'a' in 'adiner', and delete one of the 'r' in 'dinerr'. Does that make sense, or am I missing something? (the latter is entirely possible.) (maybe even likely).


Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
Post #1406273
Posted Friday, January 11, 2013 7:02 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 9:07 PM
Points: 3,420, Visits: 5,347
Greg Snidow (1/11/2013)
dwain.c
I for one have never heard of this famous "Levenshtein Edit Distance" problem, but you can rest assured that now I'm going to have to take a look at it as well!


I guess you're really busy Dwain (is that the Jeopardy theme in the background?), since you've not come back with anything yet.


You betcha! That crazy Vertex Covers puzzle has me losing sleep!

Looks like you may have beaten me to it.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1406294
Posted Saturday, January 12, 2013 4:22 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 1:06 PM
Points: 1,563, Visits: 2,392
dwain.c
Looks like you may have beaten me to it.


Oh, don't worry about that, Dwain. Admittedly, I did look at it thinking it was only a minor tweaking of Alan's code, but I think it may be more complicated than that when you really ponder on it.


Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
Post #1406328
Posted Monday, January 14, 2013 9:11 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 3:49 PM
Points: 589, Visits: 2,750
Greg Snidow (1/11/2013)
Alan.B
I came up with this:

-- strings to compare
DECLARE @s1 varchar(8000)='diner',
@s2 varchar(8000)='dinerr';

DECLARE @ld int=ABS(LEN(@s1)-LEN(@s2));

IF ((@s1=@s2) OR ((ISNULL(LEN(@s1)*LEN(@s2),0)=0))) BEGIN GOTO LD END;

DECLARE @minlen int=CASE WHEN LEN(@s1)>LEN(@s2) THEN LEN(@s2) ELSE LEN(@s1) END;

;WITH
nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM [master].dbo.spt_values Tally),
matrix AS (SELECT SUBSTRING(@s1,n,1) s1, SUBSTRING(@s2,n,1) s2 FROM nums WHERE n<=@minlen)
SELECT @ld+=COUNT(*) FROM matrix WHERE s1<>s2;
LD:
SELECT @ld AS LD;



Alan, I had never heard of this either. If the Wiki description Dwain posted is to be trusted, I think you might want to re-visit this. For example, for the two strings you provided, 'diner' and 'dinerr', the output is 1. This makes sense, because all you have to do is delete the second 'r' in string 2 (or the first 'r' for that matter), and you end up with the same two strings. Next, let's say I change the 'd' in string 1 to 'a'. The code returns 2, which I think is correct, since all I have to do is substitute 'a' in string 1 to 'd', then delete one of the 'r' in string 2, and I have two of the same string. Now, consider this: let's say I insert the 'a' in string 1, but leave the rest intact, leaving me with 'adiner' for string 1, and I leave string 2 as 'dinerr'. The code returns a value of 5. This does not make sense, again, if Wiki is to be trusted, as it states the allowable actions are single character insertions, deletions, and substitutions. So, in order to make 'adiner' = 'dinerr', I only need two actions: delete the 'a' in 'adiner', and delete one of the 'r' in 'dinerr'. Does that make sense, or am I missing something? (the latter is entirely possible.) (maybe even likely).


Sorry for checking back in late...

Levenshtrein was (is) new to me too. I did a little research and, yes, you are correct - I did not get it correct. Not only that, I was not able to get it correct (without a loop) after a lot of effort. That Levenshtein business is a little more complicated than I thought. I am going to keep at it and I'm dying to see what Dwain puts together.

Nonetheless - I was successful at creating a purely set-based version of what I thought the would produce the Levenshtein Distance; and did so using a tally table :). The goal was to improve my tally table skills more than anything.

If you feed my function 2 strings of equally length it will provide you with the Hamming Distance (not as big of a deal).


-- Alan Burstein



Read this article for best practices on asking questions.
Need to split a string? Try this (Jeff Moden)
Need a pattern-based string spitter? Try this (Dwain Camps)
My blog
Post #1406793
« Prev Topic | Next Topic »

Add to briefcase ««1234»»»

Permissions Expand / Collapse