Case sensitive pattern matching

  • I have a set of data where a column contains titles which have been formatted as follows:

    "FirstWordSecondWordThirdWord...." etc.

    That is, all the words have been concatenated but can be visually separated by their capital first letters.

    For reporting purposes, I need to break this column into the separate words so that it looks like:

    "First Word Second Word Third Word...." etc.

    Any thoughts as to how this can be achieved?

  • For this type of heavy string manipulation I would rather use CLR. Is that an option at your shop?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (12/20/2013)


    For this type of heavy string manipulation I would rather use CLR. Is that an option at your shop?

    Unfortunately not. If it can't be done using standard scalar functions for string manipulation then my best option will be to write (or have someone write) a custom function to do it.

  • andyscott (12/20/2013)


    Sean Lange (12/20/2013)


    For this type of heavy string manipulation I would rather use CLR. Is that an option at your shop?

    Unfortunately not. If it can't be done using standard scalar functions for string manipulation then my best option will be to write (or have someone write) a custom function to do it.

    UGH!!! This type of string manipulation in a scalar function is going to horrendous for performance. I have an idea of how we might be able to use a tally table for this. Let me see if I can find some time to work on this.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • ok i have this example in my snippets for turning CamelCase into a single string with spaces:

    now, if each word becomes a column unto itself, that's tougher...

    that would have to be added on to split the words to individual rows, and pivoted back into spererate columns

    /*

    --Results

    First Word Second Word Third Word

    I Really Hope This Works.

    If Not Please Let Me Know

    */

    DECLARE @table TABLE

    (Input VARCHAR(MAX))

    INSERT INTO @Table

    SELECT 'FirstWordSecondWordThirdWord' UNION ALL

    SELECT 'IReallyHopeThisWorks.' UNION ALL

    SELECT 'IfNotPleaseLetMeKnow'

    SELECT

    STUFF((SELECT

    CASE

    WHEN PATINDEX('%[A-Z]%',SUBSTRING(Input,N,1) COLLATE Latin1_General_BIN) != 0

    THEN ' ' + SUBSTRING(Input,N,1)

    ELSE SUBSTRING(Input,N,1)

    END

    FROM (SELECT TOP 8000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

    FROM sys.columns) Tally WHERE N <=LEN(Input) FOR XML PATH('')),1,1,'')

    FROM @table

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (12/20/2013)


    ok i have this example in my snippets for turning CamelCase into a single string with spaces:

    now, if each word becomes a column unto itself, that's tougher...

    Thanks for sharing Lowell. This is along the lines of what I had in my head.

    Just for grins I tweaked your code just slightly to use a cte tally instead of hitting an actual table. I have become such a huge fan of the no read tally table because it is so crazy fast. I have actually created a view on my system for that.

    DECLARE @table TABLE

    (Input VARCHAR(MAX))

    INSERT INTO @Table

    SELECT 'FirstWordSecondWordThirdWord' UNION ALL

    SELECT 'IReallyHopeThisWorks.' UNION ALL

    SELECT 'IfNotPleaseLetMeKnow';

    WITH

    E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    )

    SELECT

    STUFF((SELECT

    CASE

    WHEN PATINDEX('%[A-Z]%',SUBSTRING(Input,N,1) COLLATE Latin1_General_BIN) != 0

    THEN ' ' + SUBSTRING(Input,N,1)

    ELSE SUBSTRING(Input,N,1)

    END

    FROM cteTally WHERE N <=LEN(Input) FOR XML PATH('')),1,1,'')

    FROM @table

    Once I flipped this to the no read style of tally table the inserts are the slowest part of the whole batch. 😀

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (12/20/2013)


    Lowell (12/20/2013)


    ok i have this example in my snippets for turning CamelCase into a single string with spaces:

    now, if each word becomes a column unto itself, that's tougher...

    Thanks for sharing Lowell. This is along the lines of what I had in my head.

    Just for grins I tweaked your code just slightly to use a cte tally instead of hitting an actual table. I have become such a huge fan of the no read tally table because it is so crazy fast. I have actually created a view on my system for that.

    DECLARE @table TABLE

    (Input VARCHAR(MAX))

    INSERT INTO @Table

    SELECT 'FirstWordSecondWordThirdWord' UNION ALL

    SELECT 'IReallyHopeThisWorks.' UNION ALL

    SELECT 'IfNotPleaseLetMeKnow';

    WITH

    E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    )

    SELECT

    STUFF((SELECT

    CASE

    WHEN PATINDEX('%[A-Z]%',SUBSTRING(Input,N,1) COLLATE Latin1_General_BIN) != 0

    THEN ' ' + SUBSTRING(Input,N,1)

    ELSE SUBSTRING(Input,N,1)

    END

    FROM cteTally WHERE N <=LEN(Input) FOR XML PATH('')),1,1,'')

    FROM @table

    Once I flipped this to the no read style of tally table the inserts are the slowest part of the whole batch. 😀

    Nice code, Sean!

    Why do you use STUFF? There is no need of it. You can replace STUFF with LTRIM() and escape the first space.

    Thanks,

    IgorMi

    Igor Micev,My blog: www.igormicev.com

  • the extra STUFF is how most of us remove the delimiter;

    while in this case, you could use LTRIM becasue we used a space, for any other delimiter(comma delimiter, pipes, whatever) then you have to switch back to STUFF to remove the prceeding /first delimiter from the xml anyway..

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • And that code is all Lowell, I merely made a extremely minor modification to it. All kudos belong to Lowell.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (12/20/2013)


    And that code is all Lowell, I merely made a extremely minor modification to it. All kudos belong to Lowell.

    well, i dunno about the kudos, but thank you, Sean!

    I know i copied it from an exisiting example, so i'm just a hoarder in this case.

    I google-fu'd for the original post, and it was actually created back in 2009 by Christopher Stobbs in this thread:

    ]http://www.sqlservercentral.com/Forums/FindPost679633.aspx,

    but i've posted it a few times for similar questions as well.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (12/20/2013)


    the extra STUFF is how most of us remove the delimiter;

    while in this case, you could use LTRIM becasue we used a space, for any other delimiter(comma delimiter, pipes, whatever) then you have to switch back to STUFF to remove the prceeding /first delimiter from the xml anyway..

    Both solutions are excellent, i just made a little remark.

    😉

    Igor Micev,My blog: www.igormicev.com

  • Lowell (12/20/2013)


    Sean Lange (12/20/2013)


    And that code is all Lowell, I merely made a extremely minor modification to it. All kudos belong to Lowell.

    well, i dunno about the kudos, but thank you, Sean!

    I know i copied it from an exisiting example, so i'm just a hoarder in this case.

    I google-fu'd for the original post, and it was actually created back in 2009 by Christopher Stobbs in this thread:

    ]http://www.sqlservercentral.com/Forums/FindPost679633.aspx,

    but i've posted it a few times for similar questions as well.

    Most importantly I didn't want it to seem that it was my code because I did not write any of it. Just trying to give credit where applicable.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply