Convert alphanumeric to BigInt

  • Hi,

    I got a table with datas that has alphanumeric values like 1230-544,15C5487,132DE78.

    Now i need to extract only integers fomr these datas and convert to bigint. The other characters have no impact in my query. Is there way to produce a data like that?

    Thanks in advance.

    Mouli

    "I Love Walking In The Rain So No One Can See Me Crying ! " ~ Charlie Chaplin

  • Your post is not very clear. Do you want to get only the records that have only numbers or do you want to get all the records but remove the non numeric characters from your column? Also do you have a list of possible characters, or could that column have all possible characters?

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Have a look at this

    declare @STR varchar(100)

    select @STR = '1230-544,15C5487,132DE78'

    ;With Breakdown as

    (

    Select

    SubString(@Str,1,1)[Chr],

    1[Idx]

    Union All

    Select

    SubString(@Str,Idx+1,1),

    Idx+1

    from Breakdown

    where (Idx+1)<=Len(@Str)

    )

    select chr from Breakdown where isnumeric(chr) = 1

    Edit:- This will return ',' and '-'. You need to filter them out.

    "Keep Trying"

  • Chirag (3/23/2009)


    Have a look at this

    declare @STR varchar(100)

    select @STR = '1230-544,15C5487,132DE78'

    ;With Breakdown as

    (

    Select

    SubString(@Str,1,1)[Chr],

    1[Idx]

    Union All

    Select

    SubString(@Str,Idx+1,1),

    Idx+1

    from Breakdown

    where (Idx+1)<=Len(@Str)

    )

    select chr from Breakdown where isnumeric(chr) = 1

    Edit:- This will return ',' and '-'. You need to filter them out.

    Hey there, ol' friend... you do realize that recursive CTE's are slower than even cursors, right?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry... went to edit and ended up double posting, instead... please see below in my next post.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Mouli,

    I believe this will solve your problem... no UDF overhead... no need for CROSS-APPLY. Just use the whole thing as another derived table. I suppose it could be a UDF or view, as well.

    --===== Create and populate a test table with the data given in the post.

    -- This is NOT a part of the solution

    CREATE TABLE #YourTable

    (

    SomeString VARCHAR(20)

    )

    INSERT INTO #YourTable

    SELECT '1230-544' UNION ALL

    SELECT '15C5487' UNION ALL

    SELECT '132DE78'

    --===== This solves the problem.

    ;WITH

    cteSplit AS

    (--==== This not only splits out the individual characters, it only splits

    -- out the digits from 0 to 9

    SELECT SomeString,

    ROW_NUMBER() OVER (ORDER BY yt.SomeString) AS CharacterNumber,

    SUBSTRING(yt.SomeString,t.N,1) AS Character

    FROM #YourTable yt

    CROSS JOIN dbo.Tally t

    WHERE t.N <= LEN(yt.SomeString)

    AND SUBSTRING(yt.SomeString,t.N,1) LIKE '[0-9]'

    )--==== This put's it all back together using a very high speed XML method

    SELECT t1.SomeString,

    CAST((SELECT '' + t2.Character

    FROM cteSplit t2

    WHERE t1.SomeString = t2.SomeString

    ORDER BY t2.CharacterNumber

    FOR XML PATH(''))

    AS BIGINT) AS NumbersOnly

    FROM cteSplit t1

    GROUP BY t1.SomeString

    If you don't already have a Tally table at hand, now's the time to build this incredibly useful tool. Read the article at the following link not only for how to build one, but to understand how it works, as well.

    http://www.sqlservercentral.com/articles/TSQL/62867/

    If, for some reason, folks won't let you build one in the database, please post back... there's more than one way to get around that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    I have used your script both in UDF and as a query. But the problem is when the table records exceeds 10000, there is a performance issue. It takes more than two mins.

    This is my table:

    DECLARE @intCount INT

    SET @intCount = 10000

    WHILE(@intCount>=1)

    BEGIN

    INSERT INTO #Temp (Data) SELECT '~`!@#$%^&*()_2+-={}[]:";<>,.?/|\",.~`'

    SET @intCount = @intCount - 1

    END

    Also, i have tried out a function that i have created that has loop

    CREATE FUNCTION dbo.ufn_ExtractNumbersFromText

    (

    @vchInput VARCHAR(200)

    )

    RETURNS VARCHAR(200)

    AS

    BEGIN

    DECLARE @vchOutput VARCHAR(200)

    SET @vchInput = LTRIM(RTRIM(@vchInput))

    DECLARE @i INT

    DECLARE @intCount INT

    DECLARE @vchTemp VARCHAR(1)

    SET @i = 1

    SET @vchOutput = ''

    SET @intCount = LEN(@vchInput)

    WHILE(@intCount >= 1)

    BEGIN

    SET @vchTemp = SUBSTRING(@vchInput,@i,1)

    SET @vchOutput= @vchOutput + CASE WHEN @vchTemp LIKE '[0-9]'

    THEN @vchTemp ELSE '' END

    SET @i = @i + 1

    SET @intCount = @intCount - 1

    END

    RETURN @vchOutput

    END

    And this takes nearly 6 Secs For 20000.

    Anyways the script you have sent can be modified or are there any other hidden advantage compared to my function.

    "I Love Walking In The Rain So No One Can See Me Crying ! " ~ Charlie Chaplin

  • Oops,

    Forget to add the script of how the function is used on the table to get the required data.

    SELECT * FROM #Temp

    UPDATE #Temp SET Output = dbo.ufn_ExtractNumbersFromText(3,Data)

    SELECT * FROM #Temp

    Thanks

    "I Love Walking In The Rain So No One Can See Me Crying ! " ~ Charlie Chaplin

  • Jeff

    I mistook this (1230-544,15C5487,132DE78) for a single value rather than 3 values that they are. I assume the op wanted to process one value.

    Using a number table would be the best way of doing this.

    Assuming you have a number table called number this is the query

    select substring('1230-544',number,1)

    from number where number <= len('1230-544')

    and substring('1230-544',number,1) like '[%0-9%]'

    "Keep Trying"

  • Chirag (3/23/2009)


    Jeff

    I mistook this (1230-544,15C5487,132DE78) for a single value rather than 3 values that they are. I assume the op wanted to process one value.

    Using a number table would be the best way of doing this.

    Assuming you have a number table called number this is the query

    select substring('1230-544',number,1)

    from number where number <= len('1230-544')

    and substring('1230-544',number,1) like '[%0-9%]'

    Understood and thanks for the feeback, Chirag... just wanted you to know that recursion should be avoided even more than a well written cursor is.

    Also, your query doesn't put it all back together like the op wanted and you don't need the % signs in the like because you're looking at a single character.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jchandramouli (3/23/2009)


    Jeff,

    I have used your script both in UDF and as a query. But the problem is when the table records exceeds 10000, there is a performance issue. It takes more than two mins.

    I check it when I get home from work tonight. Thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And, yes... there's a few more tricks we can try.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • One more thing.... can you post the code for the UDF that uses my method... and why are you using a UDF for this? It was meant to resolve the entire table at once. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wow.... you're right. I've never seen the XML concatenation method run so slow. Now that we have the test data you want, I believe we can do a little better than the While Loop solution you have.

    Here's your test data generator (I did a little "Mo-dee-can tweekin' on it) and your test table (best I can make out from your code)...

    --===== Create the test table and populate it

    CREATE TABLE #Temp (Data VARCHAR(100), OutPut BIGINT)

    INSERT INTO #Temp (Data)

    SELECT TOP 20000 '~`!@#$%^&*()_2+-={}[]:";<>,.?/|\",.~`'

    FROM Master.sys.SysColumns sc1

    CROSS JOIN Master.sys.SysColumns sc2

    Here's my idea of what the function should look like....

    CREATE FUNCTION dbo.DigitsOnly

    (@SomeString VARCHAR(8000))

    RETURNS BIGINT

    AS

    BEGIN

    DECLARE @CleanString VARCHAR(8000)

    SET @CleanString = ''

    SELECT @CleanString = @CleanString + SUBSTRING(@SomeString,t.N,1)

    FROM dbo.Tally t

    WHERE t.N <= LEN(@SomeString)

    AND SUBSTRING(@SomeString,t.N,1) LIKE '[0-9]'

    RETURN CAST(@CleanString AS BIGINT)

    END

    Notice that it uses a Tally table... the following article explains not only how to build one, but also how it replaces a While Loop... it's got a heck of a lot more uses than just what show up in that article...

    http://www.sqlservercentral.com/articles/TSQL/62867/

    ... And, here's the code to use the function...

    SELECT * FROM #Temp

    UPDATE #Temp SET Output = dbo.DigitsOnly(Data)

    SELECT * FROM #Temp

    Including the two selects, the While loop version takes between 9 and 10 seconds on my box. The Tally table version takes between 4 and 5.

    Here's another test table you can run... same everything except different data...

    drop table #Temp

    go

    --===== Create the test table and populate it

    CREATE TABLE #Temp (Data VARCHAR(100), OutPut BIGINT)

    INSERT INTO #Temp (Data)

    SELECT TOP 20000 '~`!@#$%^&*()_2+-={}[]:";<>,.?/|\",.~`'+RIGHT(NEWID(),12)

    FROM Master.sys.SysColumns sc1

    CROSS JOIN Master.sys.SysColumns sc2

    Both runs take about the same time as the prevously did with the Tally table version being almost twice as fast.

    Hope all that helps.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/24/2009)


    Chirag (3/23/2009)


    Jeff

    select substring('1230-544',number,1)

    from number where number <= len('1230-544')

    and substring('1230-544',number,1) like '[%0-9%]'

    Also, your query doesn't put it all back together like the op wanted and you don't need the % signs in the like because you're looking at a single character.

    You are right.

    "Keep Trying"

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply