uses of string splitters

  • Hi, I'm reading sql stuff and came across creating functions to split strings into columns/tables. Since the only time I've had to split up a row into columns is when ingesting a .csv or .txt file, which I've done with SSIS's built in delimiter functions, I'm still wondering what kinds of situations require splitting strings?

    Can anyone give me their experience and scenario?

  • KoldCoffee (5/29/2014)


    Hi, I'm reading sql stuff and came across creating functions to split strings into columns/tables. Since the only time I've had to split up a row into columns is when ingesting a .csv or .txt file, which I've done with SSIS's built in delimiter functions, I'm still wondering what kinds of situations require splitting strings?

    Can anyone give me their experience and scenario?

    Before we go into specifics please tell me you have read the article found in my signature about splitting strings?

    Aside from that yes splitting csv data is a common usage for that. There are other times when data is not normalized that it is useful.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean, I've taken a look at Jeff's article and it does chain to another prerequisite read on Tally Tables to get the most out of the 'how'. But I'll be far more motivated to learn it if I know why or when I'll need to reach for this tool.

    Can you give me scenarios where people use splitters?

  • I have had to use string splitters to help with removing certain characters from things like phone numbers.

    Another common use is to split delimited strings.

    Depending on the string there are many different methods to split the strings. The moden 8k splitter is generally fast for most cases. CLR splitters are even faster. Occasionally (edge cases) I have found that a while loop type of splitter is much faster than the 8k splitter. The point there is to be prepared to implement the right splitter for the scenario to help maintain performance.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • KoldCoffee (5/29/2014)


    Sean, I've taken a look at Jeff's article and it does chain to another prerequisite read on Tally Tables to get the most out of the 'how'. But I'll be far more motivated to learn it if I know why or when I'll need to reach for this tool.

    Can you give me scenarios where people use splitters?

    Splitters aside, if you are unfamiliar with tally tables I would say that article is a requirement. The concepts you can learn from the tally table goes WAY beyond a splitter. It can (and hopefully will) change the way people look at data.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • KoldCoffee (5/29/2014)


    Sean, I've taken a look at Jeff's article and it does chain to another prerequisite read on Tally Tables to get the most out of the 'how'. But I'll be far more motivated to learn it if I know why or when I'll need to reach for this tool.

    Can you give me scenarios where people use splitters?

    I wrote one from scratch to deal with strings that had all sorts of junk in them including HTML delimiters, but had predictable and consistent attributes that I could use to step my way through the string, e.g. "Prod 1 Size", "Prod 1 Flavor", "Prod 2 Size", "Prod 2 Flavor", etc.

    Not going to post it here mostly because I'd be embarrassed for other people to see it, but it got the job done. 🙂

  • SQL is delicious (5/29/2014)


    KoldCoffee (5/29/2014)


    Sean, I've taken a look at Jeff's article and it does chain to another prerequisite read on Tally Tables to get the most out of the 'how'. But I'll be far more motivated to learn it if I know why or when I'll need to reach for this tool.

    Can you give me scenarios where people use splitters?

    I wrote one from scratch to deal with strings that had all sorts of junk in them including HTML delimiters, but had predictable and consistent attributes that I could use to step my way through the string, e.g. "Prod 1 Size", "Prod 1 Flavor", "Prod 2 Size", "Prod 2 Flavor", etc.

    Not going to post it here mostly because I'd be embarrassed for other people to see it, but it got the job done. 🙂

    It is the "stepping through the string" that so frequently causes these home grown splitters to be so slow. There is no need to be embarrassed about code you wrote that works. Let's face it, the MOST important aspect is that it works correctly. A close second is doing it fast. I would take slow accurate code over fast inaccurate code any day. The trick is learning how to get both.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (5/29/2014)


    SQL is delicious (5/29/2014)


    KoldCoffee (5/29/2014)


    Sean, I've taken a look at Jeff's article and it does chain to another prerequisite read on Tally Tables to get the most out of the 'how'. But I'll be far more motivated to learn it if I know why or when I'll need to reach for this tool.

    Can you give me scenarios where people use splitters?

    I wrote one from scratch to deal with strings that had all sorts of junk in them including HTML delimiters, but had predictable and consistent attributes that I could use to step my way through the string, e.g. "Prod 1 Size", "Prod 1 Flavor", "Prod 2 Size", "Prod 2 Flavor", etc.

    Not going to post it here mostly because I'd be embarrassed for other people to see it, but it got the job done. 🙂

    It is the "stepping through the string" that so frequently causes these home grown splitters to be so slow. There is no need to be embarrassed about code you wrote that works. Let's face it, the MOST important aspect is that it works correctly. A close second is doing it fast. I would take slow accurate code over fast inaccurate code any day. The trick is learning how to get both.

    Mine wasn't slow, but I never tested it with huge data sets either. I'd bet it wouldn't scale well though I really have no idea.

    And I agree it's better to be slow and right than fast and wrong, though it's best to be fast and right!!! Still working on that one.

  • SQL is delicious (5/29/2014)


    Sean Lange (5/29/2014)


    SQL is delicious (5/29/2014)


    KoldCoffee (5/29/2014)


    Sean, I've taken a look at Jeff's article and it does chain to another prerequisite read on Tally Tables to get the most out of the 'how'. But I'll be far more motivated to learn it if I know why or when I'll need to reach for this tool.

    Can you give me scenarios where people use splitters?

    I wrote one from scratch to deal with strings that had all sorts of junk in them including HTML delimiters, but had predictable and consistent attributes that I could use to step my way through the string, e.g. "Prod 1 Size", "Prod 1 Flavor", "Prod 2 Size", "Prod 2 Flavor", etc.

    Not going to post it here mostly because I'd be embarrassed for other people to see it, but it got the job done. 🙂

    It is the "stepping through the string" that so frequently causes these home grown splitters to be so slow. There is no need to be embarrassed about code you wrote that works. Let's face it, the MOST important aspect is that it works correctly. A close second is doing it fast. I would take slow accurate code over fast inaccurate code any day. The trick is learning how to get both.

    Mine wasn't slow, but I never tested it with huge data sets either. I'd bet it wouldn't scale well though I really have no idea.

    And I agree it's better to be slow and right than fast and wrong, though it's best to be fast and right!!! Still working on that one.

    The last one just takes time and practice. 😉

    Most splitters out there are not slow on small data sets. It is when you get to larger data sets that it becomes noticeable.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • ooh, I see more posts. --looks like it's primarily used to scrub data--

  • in the book SQL Server Common Table Expressions Joes 2 Pros by Steve Stedman et al., I read about the use of splitters to count the number of times a certain word appears in a string. I thought that was a novel use of the splitter and would be interested in more examples like that...

  • KoldCoffee (5/29/2014)


    in the book SQL Server Common Table Expressions Joes 2 Pros by Steve Stedman et al., I read about the use of splitters to count the number of times a certain word appears in a string. I thought that was a novel use of the splitter and would be interested in more examples like that...

    I don't think I would use a splitter for that. Seems like overkill that might not get accurate results anyway.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • interesting. Still looks like a main use for splitters is to split data strings into columns when built in functionality of ssis or import/export wizard isn't enough.

  • KoldCoffee (5/29/2014)


    interesting. Still looks like a main use for splitters is to literally split it up into columns when built in functionality of ssis or import/export wizard isn't enough.

    Or when you can't use SSIS. Consider if you had a table that contained a comma separated list and you wanted to break it apart for a query. This is all within the database and in a query so SSIS is out of the question. Along comes the splitter.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (5/29/2014)


    SQL is delicious (5/29/2014)


    Sean Lange (5/29/2014)


    SQL is delicious (5/29/2014)


    KoldCoffee (5/29/2014)


    Sean, I've taken a look at Jeff's article and it does chain to another prerequisite read on Tally Tables to get the most out of the 'how'. But I'll be far more motivated to learn it if I know why or when I'll need to reach for this tool.

    Can you give me scenarios where people use splitters?

    I wrote one from scratch to deal with strings that had all sorts of junk in them including HTML delimiters, but had predictable and consistent attributes that I could use to step my way through the string, e.g. "Prod 1 Size", "Prod 1 Flavor", "Prod 2 Size", "Prod 2 Flavor", etc.

    Not going to post it here mostly because I'd be embarrassed for other people to see it, but it got the job done. 🙂

    It is the "stepping through the string" that so frequently causes these home grown splitters to be so slow. There is no need to be embarrassed about code you wrote that works. Let's face it, the MOST important aspect is that it works correctly. A close second is doing it fast. I would take slow accurate code over fast inaccurate code any day. The trick is learning how to get both.

    Mine wasn't slow, but I never tested it with huge data sets either. I'd bet it wouldn't scale well though I really have no idea.

    And I agree it's better to be slow and right than fast and wrong, though it's best to be fast and right!!! Still working on that one.

    The last one just takes time and practice. 😉

    Most splitters out there are not slow on small data sets. It is when you get to larger data sets that it becomes noticeable.

    Absolutely right.

    The specific splitter I'm referring to was written for a friend's online business; she had multipacks in two different sizes and in quantities of 3, 6, and 12 that were fully customizable where customers chose a flavor and strength for each item in the multipack.

    Normally I wouldn't consider writing a splitter for something like this, but her online shopping cart's admin software considered each multipack size/quantity to be its own distinct product so if she wanted to see what products were most popular by flavor/strength, she couldn't because all it showed here were individual purchases, 3-pack multis, 6-pack multis, and 12-pack multis. Basically, that sales data for multis couldn't be broken down any further by the cart software's reporting tools. The only way to see what was in each multipack was to drill down into individual orders.

    I figured this was a good solution for a database. I exported the data from the cart software's admin site and the multipack order strings looked something like this:

    "12 FL OZ MULTIPACK:

    <H1>BOTTLE 1 FLAVOR:</H1>

    ROOT BEER

    <H1>BOTTLE 1 STRENGTH:</H1>

    12.75

    <H1>BOTTLE 2 FLAVOR:</H1>" etc.

    I'm oversimplifying it since I don't remember exactly what was in it, but it was a disaster.

    I replaced the HTML tags with pipe delimiters:

    "12 FL OZ MULTIPACK:|BOTTLE 1 FLAVOR:|ROOT BEER|BOTTLE 1 STRENGTH|12.75|BOTTLE 2 FLAVOR|" etc.

    Built a temp table to hold a counter and a list of all the attributes so I could use them as variables and iterate through a loop as many times as I needed to (3, 6, or 12 times) then I coded it to grab everything between this: "|BOTTLE 1 FLAVOR:|" and "|BOTTLE 1 STRENGTH|" for bottle 1 flavor, everything between this: "|BOTTLE 1 STRENGTH|" and "|BOTTLE 2 FLAVOR|" for bottle 1 strength, etc. etc. using a fairly hideous combination of CHARINDEX, SUBSTRING, and LEN and put each bottle on its own row. Not the most elegant solution. More of a sledgehammer than a scalpel.

    But it did work and I felt totally proud of it. 🙂

Viewing 15 posts - 1 through 15 (of 59 total)

You must be logged in to reply to this topic. Login to reply