April 28, 2005 at 11:23 am
OkaY,
Here is the issue. I have a column of invoice numbers which I want to be only numeric. Unfortunately it's about 500,000 records and there are some alpha characters in different places.
For Example :
12345A
1A23456789B
C525D598
AB12589
So as you can see, they are all over the place!!
Is there any way I can say take out everything that is not numeric and vice versa?
April 28, 2005 at 11:27 am
Yes you can but I'd expect an operation like this to create duplicates. What r u trying to achieve by this?
April 28, 2005 at 12:30 pm
This is based upon a table called "AP" with an InvoiceNum field and an update to a CleanInvoiceNum field. See if this will help you.
(Generally, Invoice checking like this is not for duplicates. Rather, it is for transposing numbers in an Invoice. Hence, one invoice may get paid twice if the numbers of letters where switched. It is a common routine for accounting systems. Chances are good you will get someone else posting with even more efficient code.)
SET NOCOUNT ON
DECLARE @CurrentID int,
@MaxID int,
@OriginInvoiceNumber varchar(25),
@TempInvoiceNumber varchar(1),
@InvoiceNumber varchar(25),
@CurrentPosition int,
@Length int -- Table
SELECT @CurrentID = (SELECT MIN( RowID) FROM AP),
@MaxID = (SELECT MAX( RowID) FROM AP)
WHILE @CurrentID <= @MaxID
BEGIN
SELECT @InvoiceNumber = ' '
SELECT @OriginInvoiceNumber = ' ' -- Table
SELECT @OriginInvoiceNumber = ISNULL( (SELECT InvoiceNum FROM AP WHERE RowID = @CurrentID), '0')
IF @OriginInvoiceNumber <> '0'
BEGIN
SELECT @CurrentPosition = 1
SELECT @Length = (SELECT LEN( RTRIM( LTRIM( @OriginInvoiceNumber))))
WHILE @CurrentPosition <= @Length
BEGIN
SELECT @TempInvoiceNumber = (SELECT SUBSTRING( @OriginInvoiceNumber, @CurrentPosition, 1))
IF @TempInvoiceNumber BETWEEN CHAR(48) AND char(57)
BEGIN
SELECT @InvoiceNumber = (SELECT @InvoiceNumber + @TempInvoiceNumber)
END
SELECT @CurrentPosition = @CurrentPosition + 1
END
END
IF ISNUMERIC( @InvoiceNumber) = 0
BEGIN
SELECT @InvoiceNumber = '0'
END
IF ISNUMERIC( @InvoiceNumber) = 1
BEGIN -- Table
BEGIN TRANSACTION CleanInvoiceNum
UPDATE AP SET
CleanInvoiceNum = CONVERT( numeric(38,0), RTRIM( LTRIM( @InvoiceNumber)))
WHERE RowID = @CurrentID
COMMIT TRANSACTION CleanInvoiceNum
END
-- Table
SELECT @CurrentID = (SELECT MIN( RowID) FROM AP WHERE RowID > @CurrentID)
END
-------------------------------------------------------------------------
If you would rather handle this with a function, you can try the following:
CREATE FUNCTION dbo.RemoveChars(@Input varchar(1000))
RETURNS VARCHAR(100)
BEGIN
DECLARE @pos int
SET @Pos = PATINDEX( '%[^0-9]%', @Input) -- returns the first occurence of a pattern in a specified expression
WHILE @Pos > 0 -- in this instance, characters not to match [^]
BEGIN
SET @Input = STUFF( @Input, @pos, 1, '') -- deletes a specified length and inserts another set of characters
-- in this instance, inserts empty spaces
SET @Pos = PATINDEX( '%[^0-9]%', @Input)
END
RETURN @Input
END
GO
SELECT 'a1sdsad124325143gffdfd4dgsf', CONVERT( bigint, dbo.RemoveChars( 'a1sdsad124325143gffdfd4dgsf'))
DROP FUNCTION dbo.RemoveChars
I wasn't born stupid - I had to study.
April 28, 2005 at 12:43 pm
How about this for faster solution :
GO
CREATE TABLE [Numbers] (
[PkNumber] [int] IDENTITY (1, 1) NOT NULL ,
CONSTRAINT [Pk_Number] PRIMARY KEY CLUSTERED
(
[PkNumber]
) ON [PRIMARY]
) ON [PRIMARY]
GO
Declare @i as int
set @i = 0
--I use this table for string operations as well, but in this case we could stop at 64.
while @i < 8000
begin
Insert into dbo.Numbers Default values
set @i = @i + 1
end
GO
CREATE FUNCTION dbo.RemoveChars (@Input as varchar(8000))
RETURNS VARCHAR(8000)
WITH SCHEMABINDING
AS
BEGIN
Declare @Return as VARCHAR(8000)
SET @Return = ''
Select @Return = @Return + Substring(@Input, PkNumber, 1) from dbo.Numbers where ASCII(Substring(@Input, PkNumber, 1)) BETWEEN 48 and 57 AND PkNumber <= LEN(@Input)
RETURN @Return
END
GO
Select dbo.RemoveChars('l3l45kjhsf87y3')
GO
DROP Function RemoveChars
April 28, 2005 at 2:05 pm
Interesting Remi. Someone else, (I cannot remember who - probably Frank) posted a better direction for my function in its original form and now you have given me another approach. I love this place!!!
Thanks - I will enjoy looking at this...
I wasn't born stupid - I had to study.
April 28, 2005 at 2:16 pm
I just love that numbers table... so little things you can't do with it .
April 29, 2005 at 12:56 am
if you can handle the bit typework you might consider using the replace sql-function.
check BOL.
-- use select for POC-test
-- later alter to update-statement
select myalfanumber, replace(replace(upper(myalfanumber),'A','') ,'B',''),.... as mynumber
from mytable
where PATINDEX( '%[^0-9]%', myalfanumber) > 0
and alter your table's column in a single passtrough
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
April 29, 2005 at 3:50 am
FWIW:
CREATE FUNCTION dbo.RemoveChars(@Input varchar(1000))
RETURNS VARCHAR(20)
BEGIN
DECLARE @pos INT
SET @Pos = PATINDEX('%[^0-9]%',@Input)
WHILE @Pos > 0
BEGIN
SET @Input = STUFF(@Input,@pos,1,'')
SET @Pos = PATINDEX('%[^0-9]%',@Input)
END
RETURN @Input
END
GO
SELECT dbo.RemoveChars('a1sdsad124325143gffdfd4dgsf')
DROP FUNCTION dbo.RemoveChars
--------------------
11243251434
(1 row(s) affected)
or in a slightly modified version:
CREATE FUNCTION dbo.RemoveChar2(@Input VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @pos INT
WHILE PATINDEX('%[^0-9]%',@input) > 0
BEGIN
SET @pos = PATINDEX('%[^0-9]%',@input)
SET @input =
STUFF(@input,@pos,PATINDEX('%[0-9]%',STUFF(@input,1,@pos,'')+'0'),'')
END
RETURN @input
END
GO
SELECT dbo.RemoveChar2('a1sdsad124325143gffdfd4dgsf')
DROP FUNCTION dbo.RemoveChar2
GO
------------
11243251434
(1 row(s) affected)
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
April 29, 2005 at 6:32 am
Hey Frank.. you don't like to use a numbers table for something like this?
April 29, 2005 at 6:50 am
Frankly, I'm not sure on this. It looks interesting, but I suspect running through every position in a string slower than just checking with PATINDEX. Especially when there are more characters to rule out than those that will remain. I would really be interested in performance testing here. Maybe I have time for this over the weekend.
Did you do some real-world testing?
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
April 29, 2005 at 6:57 am
I have no table worthy of that term... The biggest table I have only has 20k rows... nothing near the million..
Viewing 11 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy