Split sentence into words

  • Hi,

    I have a column named Address in a table.

    I have values such as

    12-15 Hardley Street

    2A-C Hardley Street

    A-2c Hardley Street

    A-B Hardley Street

    I am required to keep the hyphen(-) intact in the first three rows.

    i.e. If there is a number in the word that contains the hyphen(-), I should keep it, else I should replace with a space.

    So the column should be replaced as below

    12-15 Hardley Street

    2A-C Hardley Street

    A-2c Hardley Street

    A B Hardley Street

    I would also like to add that it is not neccessary that the first word will contain the hyphen.

    The data can be as

    Hardley Street 2A-C

    or

    2A-C

    Any help pls

  • cmrhema (10/19/2010)


    Hi,

    I have a column named Address in a table.

    I have values such as

    12-15 Hardley Street

    2A-C Hardley Street

    A-2c Hardley Street

    A-B Hardley Street

    I am required to keep the hyphen(-) intact in the first three rows.

    i.e. If there is a number in the word that contains the hyphen(-), I should keep it, else I should replace with a space.

    So the column should be replaced as below

    12-15 Hardley Street

    2A-C Hardley Street

    A-2c Hardley Street

    A B Hardley Street

    I would also like to add that it is not neccessary that the first word will contain the hyphen.

    The data can be as

    Hardley Street 2A-C

    or

    2A-C

    Any help pls

    This does the trick...I guess...based on your requirements and data presented...

    ;

    WITH mycte

    AS ( SELECT '12-15 Hardley Street' [address]

    UNION ALL

    SELECT '2A-C Hardley Street'

    UNION ALL

    SELECT 'A-2c Hardley Street'

    UNION ALL

    SELECT 'A-B Hardley Street' )

    SELECT CASE WHEN ROW_NUMBER() OVER ( ORDER BY [address] ) > 3

    THEN REPLACE([address], '-', ' ')

    ELSE address

    END

    FROM mycte

    12-15 Hardley Street

    2A-C Hardley Street

    A-2c Hardley Street

    A B Hardley Street

    EDIT: this is not fully tested, if the address was "A-B Hard-ley street" and the more than the 3rd row, this would replace both dashes. TEST

    -- Cory

  • Thanks Cory for the reply.

    There has been one more pre-requisite, I am supposed to do in an update query.

    declare @data nvarchar(200)

    set @data='12-12 ORANGE-RED'

    select @data=

    case left(@data,charindex(' ',@data))

    when '' then

    CASE PATINDEX ('%[0-9]%',@data)

    when 0 then replace(@data,'-',' ')

    else @data

    END

    else

    CASE PATINDEX ('%[0-9]%',left(@data,charindex(' ',@data)))

    when 0 then replace(left(@data,charindex(' ',@data)),'-',' ')+'-- LEFT MOST WORD REPLACE'

    else

    CASE charindex (' ',substring(@data,charindex(' ',@data)+1,len(@data)))

    WHEN 0 THEN

    CASE PATINDEX ('%[0-9]%',substring(@data,charindex(' ',@data)+1,len(@data)))

    when 0 then left(@data,charindex(' ',@data))+ replace(substring(@data,charindex(' ',@data)+1,len(@data)),'-',' ') +'--RIGHT MOST REPLACE'

    else @data + '--struggling here'

    END

    END

    END

    end

    where @data like '%-%'

    select @data

    I have tried the above I will be updating the table as

    update tblname set @columnName=

    --lines of coding

    where @columnName like '%-%'

    I am not able to solve this, and to add to my woes, the data will be inconsistent.

    I assumed the data to be of two words, such as

    2A-C Hardley Street

    A-2c Hardley Street

    A B Hardley Street

    But the client said that he will be having data as

    Hardley Street 22-23 BO'NESS

    A-2c Hardley Street

    Hardley Street 12B Stratford-upon-avon

    I cannot create a function, the reason being we are going to update a table and format it. If I create a function that I should call it for every other row, which is time consuming.

    Kindly do not suggest SSIS as that has already been ruled out.

    Any ideas will be very helpful.

  • I am not sure what in the requirements have changed. Can you please provide data in a consumable format, and the desired output for that data.

    Also, you state SSIS has been ruled out - if I may ask, for what reasons was it ruled out?

    -- Cory

  • The requirements are that I should write only an update query

    and the string will consist of many words.

    I am not sure about the reasons why SSIS has been ruled out, but guess some thing to do with the architecture

  • Please present some data so that a tested solution can be presented.

    I'll even help (again) get you started.

    SELECT '12-15 Hardley Street'

    UNION ALL

    SELECT '2A-C Hardley Street'

    UNION ALL

    SELECT 'A-2c Hardley Street'

    UNION ALL

    SELECT 'A-B Hardley Street'

    Also, present how you want it to look when done. Please note, your requirements appear to be "remove the dash after the 3rd instance of a row", but you gave no key to sort on. Please provide that as well.

    -- Cory

  • Thanks Cory for all your answers.

    I will start again .

    I have a table which consists of the fields

    Addressline1,Addressline2,Addressline3, FormattedAddress,Town,City etc.

    The table (tblCustomerAddress) consists of 3 million rows approximately.

    What we were supposed to do is

    Replace Rd with Road,

    Remove hyphens,double spaces, c/o etc..

    we have created a procedure called sp_FormatStreet

    the first part of the procedure deals with removing the characters

    we write as

    update tblCustomerAddress set FormattedStreet =

    -- code for removing

    update tblCustomerAddress set FormattedStreet =

    -- code for replacing rd with road

    While doing the above we noticed that while replacing the -(hyphen) with space, some of the legitimate data gets replaced in an incorrect way.

    eg

    12-14 Hardley Street, is a legitimate data

    12 B Garden Street Stratford-upon-Avon, is a legitimate data

    17 Houston Street 48-52 Cheapsake, is a legitimate data

    2B-C Ladley Road, is a legitimate data

    all the above gets converted as below

    1214 Hardley Street

    12 B Garden Street StratforduponAvon

    17 Houston Street 4852 Cheapsake

    2BC Ladley Road

    So we decided that we will keep those hyphens intact where we have alphanumeric or numeric values on the either side of the hyphen

    So the rows should be like

    12-14 Hardley Street

    12 B Garden Street Stratford upon Avon

    17 Houston Street 48-52 Cheapsake

    2B-C Ladley Road

    Hence I wanted in a update statement.

    I hope I have explained myself.

    Regards

    cmrhema

  • break the street name into words (use split function with space as separator) , replace hyphens in words that contain no numbers and then concat the words back together (in the concatenate I also re-encode the XML characters- the encoding in this forum requires removing extra spaces in: ... ,'& lt;','<'),'& gt;','>'),'& amp;','&') ):

    declare @Tmp as table (streetName varchar(1000))

    insert into @Tmp values ('A-B TEST')-- test simple case

    insert into @Tmp values ('A-B &TEST') -- test XML encoding

    insert into @Tmp values ('12-14 Hardley Street, is a legitimate data')

    insert into @Tmp values ('12 B Garden Street Stratford-upon-Avon, is a legitimate data')

    insert into @Tmp values ('17 Houston Street 48-52 Cheapsake, is a legitimate data')

    insert into @Tmp values ('2B-C Ladley Road, is a legitimate data')

    select * from @Tmp

    update @Tmp

    set streetname = replace(replace(replace((select case when patindex('%[0-9]%',Data) > 0 then Data else replace(Data,'-',' ') end as [data()]

    from dbo.fnsplit(' ',streetName) for XML path('')),'& lt;','<'),'& gt;','>'),'& amp;','&')

    select * from @Tmp

    with this code you will need a fnSplit function and a Tally table (mine is zero based) - http://www.sqlservercentral.com/articles/T-SQL/62867

    CREATE FUNCTION [dbo].[fnsplit](

    @Delimiter varchar(max)

    ,@String varchar(max)

    )

    RETURNS TABLE AS

    RETURN

    SELECT

    SUBSTRING(@String, Numbers.N+1, CHARINDEX(@Delimiter, @String + @Delimiter, Numbers.N+1) - Numbers.N-1) AS Data

    , Numbers.N as Pos

    ,( Numbers.N) - LEN(REPLACE(LEFT(@String, Numbers.N), @Delimiter, N'')) + 1 AS ItemNum

    FROM dbo.Tally Numbers with (nolock)

    WHERE Numbers.N <= LEN(@String)

    AND SUBSTRING(@Delimiter + @String, Numbers.N+1, datalength(@Delimiter)) = @Delimiter;

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply