Removing Blank Fields

  • It doesn't when I run Paul's Code:

    DECLARE @Example

    TABLE (

    field1 VARCHAR(30) NULL,

    field2 VARCHAR(30) NULL,

    field3 VARCHAR(30) NULL,

    field4 VARCHAR(30) NULL,

    field5 VARCHAR(30) NULL,

    field6 VARCHAR(30) NULL

    );

    INSERT @Example

    (field1, field2, field3, field4, field5, field6)

    SELECT NULL, SPACE(0), SPACE(1), SPACE(2), SPACE(3), 'data' UNION ALL

    SELECT NULL, '', ' ', ' ', '0', '0'

    SELECT field1 = NULLIF(field1, SPACE(0)),

    field2 = NULLIF(field2, SPACE(0)),

    field3 = NULLIF(field3, SPACE(0)),

    field4 = NULLIF(field4, SPACE(0)),

    field5 = NULLIF(field5, SPACE(0)),

    field6 = NULLIF(field6, SPACE(0))

    FROM @Example;

    Also, very interesting that SPACE(0) resolves as equal to SPACE(1,2,3,etc.). That can definitely help me clean up some of my code, thanks for that Paul. Anything specific that should be pointed out on why that works that way?

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Spoke too soon, yes it does, my fault:

    DECLARE @Example

    TABLE (

    field1 VARCHAR(30) NULL,

    field2 VARCHAR(30) NULL,

    field3 VARCHAR(30) NULL,

    field4 VARCHAR(30) NULL,

    field5 VARCHAR(30) NULL,

    field6 VARCHAR(30) NULL,

    Field7 int NULL

    );

    INSERT @Example

    (field1, field2, field3, field4, field5, field6, field7)

    SELECT NULL, SPACE(0), SPACE(1), SPACE(2), SPACE(3), 'data', 0 UNION ALL

    SELECT NULL, '', ' ', ' ', '0', '0', 0

    SELECT field1 = NULLIF(field1, SPACE(0)),

    field2 = NULLIF(field2, SPACE(0)),

    field3 = NULLIF(field3, SPACE(0)),

    field4 = NULLIF(field4, SPACE(0)),

    field5 = NULLIF(field5, SPACE(0)),

    field6 = NULLIF(field6, SPACE(0)),

    field7 = NULLIF(field7, SPACE(0))

    FROM @Example;

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • You shouldn't really be comparing an int (0) to char data (space(0)).

    The reason behind these evaluating as true (and the reason behind space(0) comparing equaly to space(1,2,3, ect..) can be expressed as 0 = '' = ' ' = ' ' (eg. these are all equal values in SQL, since it automatically trims when doing a comparison, and 0 = '')

  • Yep, avoid implicit conversions like that wherever possible:

    DECLARE @Example

    TABLE (

    field1 VARCHAR(30) NULL,

    field2 VARCHAR(30) NULL,

    field3 VARCHAR(30) NULL,

    field4 VARCHAR(30) NULL,

    field5 VARCHAR(30) NULL,

    field6 VARCHAR(30) NULL,

    Field7 INTEGER NULL

    );

    INSERT @Example

    (field1, field2, field3, field4, field5, field6, field7)

    SELECT NULL, SPACE(0), SPACE(1), SPACE(2), SPACE(3), 'data', 0 UNION ALL

    SELECT NULL, '', ' ', ' ', '0', '0', 0

    SELECT field1 = NULLIF(field1, SPACE(0)),

    field2 = NULLIF(field2, SPACE(0)),

    field3 = NULLIF(field3, SPACE(0)),

    field4 = NULLIF(field4, SPACE(0)),

    field5 = NULLIF(field5, SPACE(0)),

    field6 = NULLIF(field6, SPACE(0)),

    field7 = NULLIF(field7, 0)

    FROM @Example;

  • Paul_Harvey (3/11/2010)


    Thanks everyone for your swift responses.

    After a little testing I have taken the advice of Jeff Moden (with the vital example given by Paul White!). I had not realised how effective the NULLIF statement is, and the difference in performance is staggering. The data flow is part of a SSIS package, and so instead of using a table as a datasource I am now using a SQL Query containing NULLIFs. I then removed the Execute SQL Task which removed the spaces. I haven't done any formal performance timing of the new and old methods, but I would conservatively estimate that the total running time of the package is now one tenth of what it was previously. The Execute SQL task was taking as long as some of the Fuzzy Group tasks, which are clearly more intensive.

    Thanks very much to all who chipped in.

    Paul_Harvey (3/11/2010)


    Thanks everyone for your swift responses.

    After a little testing I have taken the advice of Jeff Moden (with the vital example given by Paul White!). I had not realised how effective the NULLIF statement is, and the difference in performance is staggering. The data flow is part of a SSIS package, and so instead of using a table as a datasource I am now using a SQL Query containing NULLIFs. I then removed the Execute SQL Task which removed the spaces. I haven't done any formal performance timing of the new and old methods, but I would conservatively estimate that the total running time of the package is now one tenth of what it was previously. The Execute SQL task was taking as long as some of the Fuzzy Group tasks, which are clearly more intensive.

    Thanks very much to all who chipped in.

    I'm a little late to the conversation and you may already know this, but you could set the NULL values in SSIS for each column before loading the staging table as another alternative. You could create a Derived Column task with following code in the expression column replacing the existing spaces (I made an assumption for sample purposes on your column length and used the windows 1252 code page for the DT_STR SSIS datatype):

    LEN(TRIM([field1])) == 0) ? NULL(DT_STR,50, 1252) : [field1]

    LEN(TRIM([field2])) == 0) ? NULL(DT_STR,50, 1252) : [field2]

    LEN(TRIM([field3])) == 0) ? NULL(DT_STR,50, 1252) : [field3]

    LEN(TRIM([field4])) == 0) ? NULL(DT_STR,50, 1252) : [field4]

    LEN(TRIM([field5])) == 0) ? NULL(DT_STR,50, 1252) : [field5]

    LEN(TRIM([field6])) == 0) ? NULL(DT_STR,50, 1252) : [field6]

    LEN(TRIM([field7])) == 0) ? NULL(DT_STR,50, 1252) : [field7]

    Also, because I'm not entirely sure of your whole SSIS solution and data sources, if you are using a flat file connection you have another alternative. You can set the RetainNulls property on the connection to True. This will keep zero length columns as NULL.

    I know this was a T-SQL(2K5) forum, but you mentioned you were calling this query in a SSIS package, so I thought I would provide an example of how I have handled this in a SSIS task.

    I hope I didn't hijack the post to much.

  • John Dempsey


    I know this was a T-SQL(2K5) forum, but you mentioned you were calling this query in a SSIS package, so I thought I would provide an example of how I have handled this in a SSIS task.

    I hope I didn't hijack the post to much.

    I don't think you hijacked it at all. There's nobody here with pitchforks scanning the hills for anybody daring to mention the Terms SSIS, SSAS, SSRS, DTS etc.

    That said, the only reason I really responded here was to tell you that I love your avatar. Good stuff.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (4/5/2010)


    There's nobody here with pitchforks scanning the hills for anybody daring to mention the Terms SSIS, SSAS, SSRS, DTS etc.

    Absolutely no pitchforks here - I'm a big fan of SSIS for this sort of thing.

    Mentioning DTS might have me reaching for the farm implements, however 😀

  • That said, the only reason I really responded here was to tell you that I love your avatar. Good stuff.

    Thanks for the compliment on the avatar. It was made in honor of the pork chop throwing that sometimes goes on here ;-).

  • Thanks for your input John. I have changed my select statement to use the NULLIF only for strings, so am no longer effected by the problem.

    I will bear your suggestion in mind for future projects though!

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply