For your amusement

  • USE [collect2000]

    GO

    /****** Object: UserDefinedFunction [dbo].[tvf_ScrubData] Script Date: 05/21/2013 12:54:28 ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[tvf_ScrubData]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))

    DROP FUNCTION [dbo].[tvf_ScrubData]

    GO

    USE [collect2000]

    GO

    /****** Object: UserDefinedFunction [dbo].[tvf_ScrubData] Script Date: 05/21/2013 12:54:28 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE FUNCTION [dbo].[tvf_ScrubData]

    (

    -- Add the parameters for the function here

    @data varchar(8000),

    @type varchar(20)

    )

    RETURNS TABLE

    AS

    RETURN

    (

    select result from

    (select LTRIM(rtrim(@data)) as data) a

    outer apply

    (

    SELECT case

    when @type = 'posint' then

    case when data like '%[^0123456789]%' then null else data end

    when @type = 'int' then

    case when substring(data,1,1) like '%[^-0123456789]%' then null else case when data like '%[^0123456789]%' then null else data end end

    when @type = 'decimal' then

    case when substring(data,1,1)like '%[^-0123456789.]%' then

    null

    else

    case when data = '' then null

    when data is null then null

    when data like '%[^-0123456789.]%' then null

    else

    case when charindex('.',data) > 0 and charindex('.',data,charindex('.',data)) > 0 then null

    else data

    end

    end

    end

    when @type = 'posdecimal' then

    case when data = '' then null

    when data is null then null

    when data like '%[^-0123456789.]%' then null

    else

    case when charindex('.',data) > 0 and charindex('.',data,charindex('.',data)) > 0 then null

    else data

    end

    end

    when @type = 'date' then

    case when data = '' then null

    when data is null then null

    when len(data)<> 10 then null

    when data like '[0123456789][0123456789][/.-][0123456789][0123456789][/.-][12][0123456789][0123456789][0123456789]' then data

    else null

    end

    when @type = 'time' then

    case when data = '' then null

    when data is null then null

    when data like '[0123456789][0123456789][:.][0123456789][0123456789]' or data like '[0123456789][0123456789][:.][0123456789][0123456789][:.][0123456789][0123456789]' or data like '[0123456789][0123456789][:.][0123456789][0123456789][:.][0123456789][0123456789][.][0123456789][0123456789][0123456789][0123456789]' then data

    else null

    end

    when @type = 'datetime' then

    case when data = '' then null

    when data is null then null

    when len(data)<> 10 then null

    when data like '[0123456789][0123456789][/.-][0123456789][0123456789][/.-][12][0123456789][0123456789][0123456789] [0123456789][0123456789][:.][0123456789][0123456789][:.][0123456789][0123456789][.][0123456789][0123456789][0123456789][0123456789]' then data

    else null

    end

    end as result ) b

    )

    GO

  • What is this supposed to do?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • It's an inline tvf that scrubs data by type using LIKE.

    int, decimal (signed and unsigned), date, time, date time.

    For those situations where data just gets stuck in a varchar field and then you have to deal with it later.

    It was fun to make it be an inline TVF instead of a function.

    I was bored.

    Plus, if you pass in the type, then all your scrubamajigs are in the same place.

  • Can you post some examples of using it? I may be having a dense moment but I am not getting it. If I saw it in use it would help me. :hehe:

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Here's some sample data to throw at it.

    select * from

    (

    select '01/01/2013' as data

    union all select '01-02-2012'

    union all select '01-03-2010'

    union all select '123'

    union all select '-123'

    union all select '10:30'

    union all select '01:28:21'

    union all select '02:12:14.345'

    union all select '1.123'

    union all select '-1.23'

    union all select '1-.23'

    union all select '1.23.1' ) a outer apply tvf_ScrubData(a.data,'int')

  • I can't believe I'm still doing this:

    USE [collect2000]

    GO

    /****** Object: UserDefinedFunction [dbo].[tvf_ScrubData] Script Date: 05/21/2013 17:29:25 ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[tvf_ScrubData]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))

    DROP FUNCTION [dbo].[tvf_ScrubData]

    GO

    USE [collect2000]

    GO

    /****** Object: UserDefinedFunction [dbo].[tvf_ScrubData] Script Date: 05/21/2013 17:29:25 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    -- =============================================

    -- Author:Name

    -- Create date:

    -- Description:

    -- =============================================

    CREATE FUNCTION [dbo].[tvf_ScrubData]

    (

    -- Add the parameters for the function here

    @data varchar(8000),

    @type varchar(20)

    )

    RETURNS TABLE

    AS

    RETURN

    (

    select result from

    (select LTRIM(rtrim(@data)) as data ,

    '[0123456789][012][/.-][0123][0123456789][/.-][12][0123456789][0123456789][0123456789]' dateFmt,

    '%[^0123456789]%' posIntFmt,

    '[^-0123456789]' negIntFmt,

    '%[^0123456789]%' intFmt,

    '[^-0123456789.]' negDecFmt,

    '%[^0123456789.]%' decFmt,

    '[0123456789][0123456789][:.][0123456789][0123456789]' timeFmt1,

    '[0123456789][0123456789][:.][0123456789][0123456789][:.][0123456789][0123456789]' timeFmt2,

    '[0123456789][0123456789][:.][0123456789][0123456789][:.][0123456789][0123456789][.][0123456789][0123456789][0123456789]' timeFmt3

    ) a

    outer apply

    ( select

    rtrim(substring(data,1,CHARINDEX(' ',data))) dataDate,

    ltrim(substring(data,CHARINDEX(' ',data),99)) dataTime,

    SUBSTRING(data,1,1) firstChar,

    SUBSTRING(data,2,LEN(data)-1) rest ,

    Case when CHARINDEX('.',data) = 0 or (charindex('.',data) > 0 and charindex('.',data,charindex('.',data)+1) = 0) then 1 else 0 end oneDecimal

    ) c

    cross apply

    (

    SELECT case

    when @type = 'posint' then

    case when data not like posIntFmt then data end

    when @type = 'int' then

    case when firstChar not like negIntFmt and substring(data,2,99) not like intFmt then data end

    when @type = 'decimal' then

    case when firstChar not like negDecFmt and substring(data,2,99) not like decFmt and oneDecimal = 1 then data end

    when @type = 'posdecimal' then

    case when IsNull(data,'')<> '' and data not like decFmt and oneDecimal = 1 then data end

    when @type = 'date' then

    case when IsNull(data,'')<> '' and len(data)< 11 then

    case

    when len(data)=9 and CHARINDEX('/',data) =2 and '0'+data like dateFmt then '0'+data

    when len(data)=9 and CHARINDEX('/',data) =3 and substring(data,1,3)+'0'+SUBSTRING(data,4,6) like dateFmt then substring(data,1,3)+'0'+SUBSTRING(data,4,6)

    when len(data)=8 and '0'+ substring(data,1,2)+'0'+SUBSTRING(data,3,6) like dateFmt then '0'+ substring(data,1,2)+'0'+SUBSTRING(data,3,6)

    when data like dateFmt then data

    end

    end

    when @type = 'time' then

    case when IsNull(data,'')<> '' and (data like timeFmt1 or data like timeFmt2 or data like timeFmt3) then data end

    when @type = 'datetime' then

    case when IsNull(dataDate,'')<> '' and len(dataDate)< 11 then

    case

    when len(dataDate)=9 and CHARINDEX('/',dataDate) =2 and '0'+dataDate like dateFmt then '0'+dataDate

    when len(dataDate)=9 and CHARINDEX('/',dataDate) =3 and substring(dataDate,1,3)+'0'+SUBSTRING(dataDate,4,6) like dateFmt then substring(dataDate,1,3)+'0'+SUBSTRING(dataDate,4,6)

    when len(dataDate)=8 and '0'+ substring(dataDate,1,2)+'0'+SUBSTRING(dataDate,3,6) like dateFmt then '0'+ substring(dataDate,1,2)+'0'+SUBSTRING(dataDate,3,6)

    when dataDate like dateFmt then dataDate

    end

    end

    + ' '+

    case when IsNull(dataTime,'')<> '' and (dataTime like timeFmt1 or dataTime like timeFmt2 or dataTime like timeFmt3) then dataTime end

    end

    as result) b where result is not null

    )

    GO

    and to test

    select * from

    (

    select '01/01/2013' as data

    union all select '01-02-2012'

    union all select '01-03-2010'

    union all select '1-4-2012'

    union all select '123'

    union all select '-123'

    union all select '10:30'

    union all select '01:28:21'

    union all select '02:12:14.345'

    union all select '1.123'

    union all select '-1.23'

    union all select '1-.23'

    union all select '1.23.1' ) a cross apply tvf_ScrubData(a.data,'decimal')

  • Which is to say, what CAN'T you do in an inline tvf if you really really want to?

  • lnardozi 61862 (5/21/2013)


    Which is to say, what CAN'T you do in an inline tvf if you really really want to?

    'datetime' probably needs a bit of work!

  • 'datetime' probably needs a bit of work!

    datetime in v1 was definitely crap. v2 should be better though. That wasn't really my point though - my point is that all the functions used to scrub data could be expressed as an inline TVF. Really it would be better to have a single TVF for each type so the data would be returned in the type you were converting to so you wouldn't have to CAST it. I guess my point is really that LIKE is similar to regex, if you want it to be, while still being set based. It is WAY simpler to use a CLR tvf, but it doesn't inline.

  • lnardozi 61862 (5/22/2013)


    'datetime' probably needs a bit of work!

    datetime in v1 was definitely crap. v2 should be better though. That wasn't really my point though - my point is that all the functions used to scrub data could be expressed as an inline TVF. Really it would be better to have a single TVF for each type so the data would be returned in the type you were converting to so you wouldn't have to CAST it. I guess my point is really that LIKE is similar to regex, if you want it to be, while still being set based. It is WAY simpler to use a CLR tvf, but it doesn't inline.

    I like the idea. How would you apply it in validating an entire multicolumn row instead of a single source column at a time?

  • lnardozi 61862 (5/21/2013)


    Which is to say, what CAN'T you do in an inline tvf if you really really want to?

    Effective and high performance running totals in a pre-2012 environment, for one. Update a table for another. Use NEWID() directly. The list goes on. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Effective and high performance running totals in a pre-2012 environment, for one. Update a table for another. Use NEWID() directly. The list goes on. 😀

    The gauntlet, she has been thrown :-D. If only one of them were even REMOTELY possible... but what the hey, they said you couldn't build an (non-text) index on a varchar(max) column either. Hey, this is where the FUN comes in!

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply