is there a "Clean" funcion in SQL?

  • OkaY,

    Here is the issue.  I have a column of invoice numbers which I want to be only numeric.  Unfortunately it's about 500,000 records and there are some alpha characters in different places.

    For Example :

    12345A

    1A23456789B

    C525D598

    AB12589

     

    So as you can see, they are all over the place!!

    Is there any way I can say take out everything that is not numeric and vice versa?

  • Yes you can but I'd expect an operation like this to create duplicates. What r u trying to achieve by this?

  • This is based upon a table called "AP" with an InvoiceNum field and an update to a CleanInvoiceNum field.  See if this will help you. 

    (Generally, Invoice checking like this is not for duplicates.  Rather, it is for transposing numbers in an Invoice.  Hence, one invoice may get paid twice if the numbers of letters where switched.  It is a common routine for accounting systems.  Chances are good you will get someone else posting with even more efficient code.) 

     

    SET NOCOUNT ON

    DECLARE @CurrentID int,

     @MaxID int,

     @OriginInvoiceNumber varchar(25),

     @TempInvoiceNumber varchar(1),

     @InvoiceNumber varchar(25),

     @CurrentPosition int,

     @Length int     -- Table

    SELECT @CurrentID = (SELECT MIN( RowID) FROM AP),

     @MaxID = (SELECT MAX( RowID) FROM AP)

    WHILE @CurrentID <= @MaxID

    BEGIN

     SELECT @InvoiceNumber = ' '

     SELECT @OriginInvoiceNumber = ' '    -- Table

     SELECT @OriginInvoiceNumber = ISNULL( (SELECT InvoiceNum FROM AP WHERE RowID = @CurrentID), '0')

     IF @OriginInvoiceNumber <> '0'

     BEGIN

      SELECT @CurrentPosition = 1

      SELECT @Length = (SELECT LEN( RTRIM( LTRIM( @OriginInvoiceNumber))))

      WHILE @CurrentPosition <= @Length 

      BEGIN

       SELECT @TempInvoiceNumber = (SELECT SUBSTRING( @OriginInvoiceNumber, @CurrentPosition, 1))

        IF @TempInvoiceNumber BETWEEN CHAR(48) AND char(57)

        BEGIN

         SELECT @InvoiceNumber = (SELECT @InvoiceNumber + @TempInvoiceNumber)

        END

       SELECT @CurrentPosition = @CurrentPosition + 1

      END

     END

     IF ISNUMERIC( @InvoiceNumber) = 0

     BEGIN

      SELECT @InvoiceNumber = '0'

     END

     IF ISNUMERIC( @InvoiceNumber) = 1

     BEGIN     -- Table

       BEGIN TRANSACTION CleanInvoiceNum

        UPDATE AP SET

         CleanInvoiceNum = CONVERT( numeric(38,0), RTRIM( LTRIM( @InvoiceNumber)))

        WHERE RowID = @CurrentID

       COMMIT TRANSACTION CleanInvoiceNum

     END

           -- Table

     SELECT @CurrentID = (SELECT MIN( RowID) FROM AP WHERE RowID > @CurrentID)

    END

     

    -------------------------------------------------------------------------

    If you would rather handle this with a function, you can try the following: 

    CREATE FUNCTION dbo.RemoveChars(@Input varchar(1000))

    RETURNS VARCHAR(100)

    BEGIN

     DECLARE @pos int

     SET @Pos = PATINDEX( '%[^0-9]%', @Input) -- returns the first occurence of a pattern in a specified expression

     WHILE @Pos > 0     -- in this instance, characters not to match [^]

     BEGIN

      SET @Input = STUFF( @Input, @pos, 1, '') -- deletes a specified length and inserts another set of characters

           -- in this instance, inserts empty spaces

      SET @Pos = PATINDEX( '%[^0-9]%', @Input)

     END

     RETURN @Input

    END

    GO

    SELECT 'a1sdsad124325143gffdfd4dgsf', CONVERT( bigint, dbo.RemoveChars( 'a1sdsad124325143gffdfd4dgsf'))

    DROP FUNCTION dbo.RemoveChars

    I wasn't born stupid - I had to study.

  • How about this for faster solution :

    GO

    CREATE TABLE [Numbers] (

    [PkNumber] [int] IDENTITY (1, 1) NOT NULL ,

    CONSTRAINT [Pk_Number] PRIMARY KEY CLUSTERED

    (

    [PkNumber]

    ) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    Declare @i as int

    set @i = 0

    --I use this table for string operations as well, but in this case we could stop at 64.

    while @i < 8000

    begin

    Insert into dbo.Numbers Default values

    set @i = @i + 1

    end

    GO

    CREATE FUNCTION dbo.RemoveChars (@Input as varchar(8000))

    RETURNS VARCHAR(8000)

    WITH SCHEMABINDING

    AS

    BEGIN

    Declare @Return as VARCHAR(8000)

    SET @Return = ''

    Select @Return = @Return + Substring(@Input, PkNumber, 1) from dbo.Numbers where ASCII(Substring(@Input, PkNumber, 1)) BETWEEN 48 and 57 AND PkNumber <= LEN(@Input)

    RETURN @Return

    END

    GO

    Select dbo.RemoveChars('l3l45kjhsf87y3')

    GO

    DROP Function RemoveChars

  • Interesting Remi.  Someone else, (I cannot remember who - probably Frank) posted a better direction for my function in its original form and now you have given me another approach.  I love this place!!! 

    Thanks  -  I will enjoy looking at this...

    I wasn't born stupid - I had to study.

  • I just love that numbers table... so little things you can't do with it .

  • if you can handle the bit typework you might consider using the replace sql-function.

    check BOL.

    -- use select for POC-test

    -- later alter to update-statement

    select myalfanumber, replace(replace(upper(myalfanumber),'A','') ,'B',''),.... as mynumber

    from mytable

    where PATINDEX( '%[^0-9]%', myalfanumber) > 0

    and alter your table's column in a single passtrough

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • FWIW:

    CREATE FUNCTION dbo.RemoveChars(@Input varchar(1000))

     RETURNS VARCHAR(20)

     BEGIN

      DECLARE @pos INT

      SET @Pos = PATINDEX('%[^0-9]%',@Input)

      WHILE @Pos > 0

       BEGIN

        SET @Input = STUFF(@Input,@pos,1,'')

        SET @Pos = PATINDEX('%[^0-9]%',@Input)

       END

      RETURN @Input

    END

    GO

    SELECT dbo.RemoveChars('a1sdsad124325143gffdfd4dgsf')

    DROP FUNCTION dbo.RemoveChars

                        

    --------------------

    11243251434

    (1 row(s) affected)

    or in a slightly modified version:

    CREATE FUNCTION dbo.RemoveChar2(@Input VARCHAR(8000))

    RETURNS VARCHAR(8000)

    AS

    BEGIN

     DECLARE @pos INT

     WHILE PATINDEX('%[^0-9]%',@input) > 0

     BEGIN

      SET @pos = PATINDEX('%[^0-9]%',@input)

      SET @input =

      STUFF(@input,@pos,PATINDEX('%[0-9]%',STUFF(@input,1,@pos,'')+'0'),'')

      END

     RETURN @input

    END

    GO

    SELECT dbo.RemoveChar2('a1sdsad124325143gffdfd4dgsf')

    DROP FUNCTION dbo.RemoveChar2

    GO

               

    ------------

    11243251434

    (1 row(s) affected)

     

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Hey Frank.. you don't like to use a numbers table for something like this?

  • Frankly, I'm not sure on this. It looks interesting, but I suspect running through every position in a string slower than just checking with PATINDEX. Especially when there are more characters to rule out than those that will remain. I would really be interested in performance testing here. Maybe I have time for this over the weekend.

    Did you do some real-world testing?

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • I have no table worthy of that term... The biggest table I have only has 20k rows... nothing near the million..

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply