Tally Table Uses - Part II

  • Comments posted to this topic are about the item Tally Table Uses - Part II

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Thanks, interesting way of looking at the problem.

    Also you could change the WHERE clause to

    WHERE N <= DATALENGTH(CountryName) instead of using the SUBSTRING comparison.

    The example given in Books Online uses a WHILE loop for a similar problem.. It would be nice to know the real world speed differences in these 2 approaches for this particular dataset.

    SET TEXTSIZE 0

    SET NOCOUNT ON

    -- Create the variables for the current character string position

    -- and for the character string.

    DECLARE @position int, @string char(15)

    -- Initialize the variables.

    SET @position = 1

    SET @string = 'Du monde entier'

    WHILE @position <= DATALENGTH(@string)

    BEGIN

    SELECT ASCII(SUBSTRING(@string, @position, 1)),

    CHAR(ASCII(SUBSTRING(@string, @position, 1)))

    SET @position = @position + 1

    END

    SET NOCOUNT OFF

    GO

  • Nice example. Although I have encountered this problem many times and normally start by doing something even simpler, just selecting out the column and then len(column) - if the character length is higher than the characters I see I know I have a hidden/non-ascii character issue and it's usually when I've imported data from elsewhere and normally char 160 is the culprit!

  • We store html fragments in varchar fields. I learned that smartquotes and emdash need special handling to display properly, so instead I applied special handling to remove them. I did not know the position or the character code in thousands of rows of content. I wrote the following to identify rows to be fixed and also what and where a fix was needed:

    (assumes markup table has "content" varchar column; Tally has int field N)

    select top 1000

    m.[PKId]

    ,[position]= n.[N]

    ,[character]= substring(m.[markup], n.[N],1)

    ,[ascii]= ascii( substring(m.[markup], n.[N],1) )

    from

    markup m

    join

    (select [N] from Tally where [N] <= 255 ) n

    on

    ascii( substring(m.[content], n.[N],1) ) in

    (

    select

    [ascii]= [N]

    from

    Tally

    where

    [n] not between ascii('A') and ascii('Z')

    and

    [n] not between ascii('a') and ascii('z')

    and

    [n] not between ascii('0') and ascii('9')

    and

    (

    '@.-_' not like '%' + char([n]) + '%'

    or

    char([n]) = '%'

    )

    )

  • Interesting application of Tally, but when I am looking for issues like this I just cast into varbinary and compare the resulting hex strings. This seems much simpler than rotating all the text character by character.

  • kenglish-729097 (8/3/2010)


    Interesting application of Tally, but when I am looking for issues like this I just cast into varbinary and compare the resulting hex strings. This seems much simpler than rotating all the text character by character.

    What does that tell you other than that they don't match? How do you use the hex strings to see what is different? I'll try this myself, but I'd love to hear the explanation as well.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • What a cool application of the tally table. When I have string manipulation issues, I've always taken the educated guess / trial and error approach much like other posters, but this is a great way to visualise what you're working with.

    Thanks for the article

    Kindest Regards,

    Frank Bazan

  • Frank Bazan (8/3/2010)


    What a cool application of the tally table. When I have string manipulation issues, I've always taken the educated guess / trial and error approach much like other posters, but this is a great way to visualise what you're working with.

    Thanks for the article

    Thanks. Being able to see the granular detail all at once is a big part of why I like this approach.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Great article Stefan. A few months ago I was given a project to find all the symbols in our entire system's data. Specifically to find SQL symbols (%,@ etc). We have around 40 Gb of data, and the majority of it is character strings. I used this same approach, and it suprised me at its speed. I was expecting to have to parse through several billion characters. The process ran for around 5 hours, which was much better than I expected.

    This was a one time run, so I did not spend a lot of time optimizing it, but it does make me wonder just how fast this could be tweaked to.

  • Great article. If you have to parse strings in SQL then having the tally table is great.

    I would have appraoched the problem with a different tool set.

    SELECT '|' + CountryName + '|' FROM Country

    This would have shown me that there were hidden characters. Pasting a snipet into my favorite text editor and hovering over one of the bad apples whould have gotten me to writing the replace statement.

    Yet since you have to show the value of Tally I think that you did it very well. I'm looking forward to the next installment. I'm still trying to sell certain managament of the value of Tally and Dates.

    ATBCharles Kincaid

  • Charles Kincaid (8/3/2010)


    Great article. If you have to parse strings in SQL then having the tally table is great.

    I would have appraoched the problem with a different tool set.

    SELECT '|' + CountryName + '|' FROM Country

    This would have shown me that there were hidden characters. Pasting a snipet into my favorite text editor and hovering over one of the bad apples whould have gotten me to writing the replace statement.

    Yet since you have to show the value of Tally I think that you did it very well. I'm looking forward to the next installment. I'm still trying to sell certain managament of the value of Tally and Dates.

    You're looking forward to the next installment? I'd better come up with one then! : -) I'm sure I'll find more uses in what I'm doing currently and I'll write them up when I do.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Excel is not a friendly place to "clean things up". I have frequently found Excel to insert unfriendly characters into data, and often to truncate data strings.

  • Charles Kincaid (8/3/2010)


    Great article. If you have to parse strings in SQL then having the tally table is great.

    I would have appraoched the problem with a different tool set.

    SELECT '|' + CountryName + '|' FROM Country

    This would have shown me that there were hidden characters. Pasting a snipet into my favorite text editor and hovering over one of the bad apples whould have gotten me to writing the replace statement.

    The above is a reasonably simple way of doing it; also, cut and pasting from a cell in the grid into the editor in SSMS/Query Analyzer does the same thing; you can see that there's something else there.

    From the article, however, I spot what I consider to be the real issue:

    "I found the list for countries on Wikipedia. I copied the first 4 columns of the table, dropped them into Excel to clean them up and imported the result into a SQL Server table. I created the table and used the wizard to pull the data in."

    While the tally table exercise was entertaining, what I see as the real issue is that the data that was imported into SQL Server was never actually inspected. In cases like this, I generally have two "standard" ways of doing things:

    1) Generate a text file for a BULK INSERT/BCP... then pull it up in a hex editor (HxD or your favorite) to see what's what. This instantly shows you everything, right down to end of lines, end of pages, and so on.

    2) Use EXCEL to generate INSERT statements, i.e.

    =CONCATENATE("INSERT INTO table VALUES ('",A1,"')")

    Again, you'd see the extra characters instantly between the tick marks in your CREATE statement.

    I try to always look at the source data first; looking at the end result generally takes long.

    That said, an interesting use of the tally table character splitter technique.

  • Thanks Stefan, another great article on uses for a tally table.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Thanks for the article. I have actually used this technique myself and find it very useful. I do a little more filtering to get the unwanted characters. Here is something similar to what i use.

    DECLARE @t TABLE(col VARCHAR(MAX));

    INSERT INTO @t VALUES ('zzzzzzzzz' + CHAR(13));

    INSERT INTO @t VALUES ('zzzzzzzzz' + CHAR(11) + CHAR(13));

    INSERT INTO @t VALUES ('zzzz');

    INSERT INTO @t VALUES ('zzzz-');

    INSERT INTO @t VALUES ('zz.zz');

    INSERT INTO @t VALUES ('zz?zz');

    SELECT

    col,

    n AS Pos,

    SUBSTRING(col,n,1) AS [ASCII_Char],

    ASCII(SUBSTRING(col,n,1)) AS [ASCII_Cd]

    FROM @t t

    INNER JOIN dbo.[Numbers] n ON n.n <= LEN(col)

    WHERE SUBSTRING(col,n,1) LIKE '[^A-Za-z0-9/-.?]' ESCAPE '/' --use escape to build your exception list

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply