Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Tally Table Uses - Part II


Tally Table Uses - Part II

Author
Message
Sioban Krzywicki
Sioban Krzywicki
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1253 Visits: 8081
Comments posted to this topic are about the item Tally Table Uses - Part II

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Andrew G
Andrew G
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2101 Visits: 2238
Thanks, interesting way of looking at the problem.

Also you could change the WHERE clause to

WHERE N <= DATALENGTH(CountryName)

instead of using the SUBSTRING comparison.

The example given in Books Online uses a WHILE loop for a similar problem.. It would be nice to know the real world speed differences in these 2 approaches for this particular dataset.

SET TEXTSIZE 0
SET NOCOUNT ON
-- Create the variables for the current character string position
-- and for the character string.
DECLARE @position int, @string char(15)
-- Initialize the variables.
SET @position = 1
SET @string = 'Du monde entier'
WHILE @position <= DATALENGTH(@string)
BEGIN
SELECT ASCII(SUBSTRING(@string, @position, 1)),
CHAR(ASCII(SUBSTRING(@string, @position, 1)))
SET @position = @position + 1
END
SET NOCOUNT OFF
GO


Stuart Davis
Stuart Davis
SSC-Enthusiastic
SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)

Group: General Forum Members
Points: 112 Visits: 23
Nice example. Although I have encountered this problem many times and normally start by doing something even simpler, just selecting out the column and then len(column) - if the character length is higher than the characters I see I know I have a hidden/non-ascii character issue and it's usually when I've imported data from elsewhere and normally char 160 is the culprit!



Mike Dougherty-384281
Mike Dougherty-384281
Old Hand
Old Hand (344 reputation)Old Hand (344 reputation)Old Hand (344 reputation)Old Hand (344 reputation)Old Hand (344 reputation)Old Hand (344 reputation)Old Hand (344 reputation)Old Hand (344 reputation)

Group: General Forum Members
Points: 344 Visits: 944
We store html fragments in varchar fields. I learned that smartquotes and emdash need special handling to display properly, so instead I applied special handling to remove them. I did not know the position or the character code in thousands of rows of content. I wrote the following to identify rows to be fixed and also what and where a fix was needed:
(assumes markup table has "content" varchar column; Tally has int field N)

select top 1000
m.[PKId]
,[position] = n.[N]
,[character]= substring(m.[markup], n.[N],1)
,[ascii] = ascii( substring(m.[markup], n.[N],1) )
from
markup m
join
(select [N] from Tally where [N] <= 255 ) n
on
ascii( substring(m.[content], n.[N],1) ) in
(
select
[ascii] = [N]
from
Tally
where
[n] not between ascii('A') and ascii('Z')
and
[n] not between ascii('a') and ascii('z')
and
[n] not between ascii('0') and ascii('9')
and
(
'@.-_' not like '%' + char([n]) + '%'
or
char([n]) = '%'
)
)


kenglish-729097
kenglish-729097
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 6
Interesting application of Tally, but when I am looking for issues like this I just cast into varbinary and compare the resulting hex strings. This seems much simpler than rotating all the text character by character.
Sioban Krzywicki
Sioban Krzywicki
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1253 Visits: 8081
kenglish-729097 (8/3/2010)
Interesting application of Tally, but when I am looking for issues like this I just cast into varbinary and compare the resulting hex strings. This seems much simpler than rotating all the text character by character.


What does that tell you other than that they don't match? How do you use the hex strings to see what is different? I'll try this myself, but I'd love to hear the explanation as well.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Frank Bazan
Frank Bazan
Old Hand
Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)Old Hand (310 reputation)

Group: General Forum Members
Points: 310 Visits: 1087
What a cool application of the tally table. When I have string manipulation issues, I've always taken the educated guess / trial and error approach much like other posters, but this is a great way to visualise what you're working with.

Thanks for the article

Kindest Regards,

Frank Bazan
Sioban Krzywicki
Sioban Krzywicki
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1253 Visits: 8081
Frank Bazan (8/3/2010)
What a cool application of the tally table. When I have string manipulation issues, I've always taken the educated guess / trial and error approach much like other posters, but this is a great way to visualise what you're working with.

Thanks for the article


Thanks. Being able to see the granular detail all at once is a big part of why I like this approach.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
David Lester
David Lester
SSC Veteran
SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)SSC Veteran (209 reputation)

Group: General Forum Members
Points: 209 Visits: 1542
Great article Stefan. A few months ago I was given a project to find all the symbols in our entire system's data. Specifically to find SQL symbols (%,@ etc). We have around 40 Gb of data, and the majority of it is character strings. I used this same approach, and it suprised me at its speed. I was expecting to have to parse through several billion characters. The process ran for around 5 hours, which was much better than I expected.
This was a one time run, so I did not spend a lot of time optimizing it, but it does make me wonder just how fast this could be tweaked to.
Charles Kincaid
Charles Kincaid
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1121 Visits: 2383
Great article. If you have to parse strings in SQL then having the tally table is great.

I would have appraoched the problem with a different tool set.
SELECT '|' + CountryName + '|' FROM Country


This would have shown me that there were hidden characters. Pasting a snipet into my favorite text editor and hovering over one of the bad apples whould have gotten me to writing the replace statement.

Yet since you have to show the value of Tally I think that you did it very well. I'm looking forward to the next installment. I'm still trying to sell certain managament of the value of Tally and Dates.

ATBCharles Kincaid
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search