Removing Blank Fields

  • Hi All,

    Hopefully an easy one for somebody in "the know".

    I have a table with some 75 fields, that acts as a staging table for incoming data feeds.

    I want to eliminate blank fields and replace them with NULLs. Currently there is a piece of t-sql process that works as follows:

    Update table

    set field1 = Null where field1 = '' or field1 = ' '

    Update table

    set field2 = Null where field2 = '' or field2 = ' '

    Update table

    set field3 = Null where field3 = '' or field3 = ' '

    Update table

    set field4 = Null where field4 = '' or field4 = ' '

    Update table

    set field5 = Null where field5 = '' or field5 = ' '

    Update table

    set field6 = Null where field6 = '' or field6 = ' '

    Update table

    set field7 = Null where field7 = '' or field7 = ' '

    etc etc

    I am quite sure that this is a most inefficient way of doing it. This part of the overall data import process takes a very long time. Is there a Case statement or something similar that would imrove performance?

    Paul

  • Paul_Harvey (3/10/2010)


    Hi All,

    Hopefully an easy one for somebody in "the know".

    I have a table with some 75 fields, that acts as a staging table for incoming data feeds.

    I want to eliminate blank fields and replace them with NULLs. Currently there is a piece of t-sql process that works as follows:

    Update table

    set field1 = Null where field1 = '' or field1 = ' '

    Update table

    set field2 = Null where field2 = '' or field2 = ' '

    Update table

    set field3 = Null where field3 = '' or field3 = ' '

    Update table

    set field4 = Null where field4 = '' or field4 = ' '

    Update table

    set field5 = Null where field5 = '' or field5 = ' '

    Update table

    set field6 = Null where field6 = '' or field6 = ' '

    Update table

    set field7 = Null where field7 = '' or field7 = ' '

    etc etc

    I am quite sure that this is a most inefficient way of doing it. This part of the overall data import process takes a very long time. Is there a Case statement or something similar that would imrove performance?

    Paul

    update dbo.table set

    field1 = case when field1 is null or field1 = '' or field1 = ' ' then null else field1 end,

    field2 = case when field2 is null or field2 = '' or field2 = ' ' then null else field2 end,

    ...,

    field75 = case when field75 is null or field75 = '' or field75 = ' ' then null else field75 end

    Not sure this will be any faster. I'd look at temporarily creating indexes on each of the columns before the updates and dropping the indexes after the update. It really depends on the number of records and the distribution of data in each column.

  • Thanks for the advice Lynn. I will try the first suggestion.

  • Let us know. I was going to post what Lynn did, but too slow. I think that's the best way to do it.

  • Paul_Harvey (3/10/2010)


    Hi All,

    Hopefully an easy one for somebody in "the know".

    I have a table with some 75 fields, that acts as a staging table for incoming data feeds.

    I want to eliminate blank fields and replace them with NULLs. Currently there is a piece of t-sql process that works as follows:

    Update table

    set field1 = Null where field1 = '' or field1 = ' '

    Update table

    set field2 = Null where field2 = '' or field2 = ' '

    Update table

    set field3 = Null where field3 = '' or field3 = ' '

    Update table

    set field4 = Null where field4 = '' or field4 = ' '

    Update table

    set field5 = Null where field5 = '' or field5 = ' '

    Update table

    set field6 = Null where field6 = '' or field6 = ' '

    Update table

    set field7 = Null where field7 = '' or field7 = ' '

    etc etc

    I am quite sure that this is a most inefficient way of doing it. This part of the overall data import process takes a very long time. Is there a Case statement or something similar that would imrove performance?

    Paul

    If you're going to transfer the data from the staging table to a final table, don't waste your time changing the blanks to NULLs in the staging table. Lookup NULLIF and do it when you copy the data to the final table. It's very fast.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To illustrate Jeff's point:

    DECLARE @Example

    TABLE (

    field1 VARCHAR(30) NULL,

    field2 VARCHAR(30) NULL,

    field3 VARCHAR(30) NULL,

    field4 VARCHAR(30) NULL,

    field5 VARCHAR(30) NULL,

    field6 VARCHAR(30) NULL

    );

    INSERT @Example

    (field1, field2, field3, field4, field5, field6)

    VALUES (NULL, SPACE(0), SPACE(1), SPACE(2), SPACE(3), 'data');

    SELECT field1 = NULLIF(field1, SPACE(0)),

    field2 = NULLIF(field2, SPACE(0)),

    field3 = NULLIF(field3, SPACE(0)),

    field4 = NULLIF(field4, SPACE(0)),

    field5 = NULLIF(field5, SPACE(0)),

    field6 = NULLIF(field6, SPACE(0))

    FROM @Example;

  • Well done!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/10/2010)


    Well done!

    Thanks. I like to be helpful 😉

  • Thanks everyone for your swift responses.

    After a little testing I have taken the advice of Jeff Moden (with the vital example given by Paul White!). I had not realised how effective the NULLIF statement is, and the difference in performance is staggering. The data flow is part of a SSIS package, and so instead of using a table as a datasource I am now using a SQL Query containing NULLIFs. I then removed the Execute SQL Task which removed the spaces. I haven't done any formal performance timing of the new and old methods, but I would conservatively estimate that the total running time of the package is now one tenth of what it was previously. The Execute SQL task was taking as long as some of the Fuzzy Group tasks, which are clearly more intensive.

    Thanks very much to all who chipped in.

  • Very cool. Thanks for the feedback, Paul.

    As a sidebar, I sometimes don't give an example because I really want people to take the time to look it up in Books Online and read about it. It's kind of like learning a new word each day. There's some very important information about which datatype is returned, etc.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good point Jeff. My approach though, is to always try to provide code if I have time. My thinking is that although my script might be copy-and-pasted, anyone interested in learning will always 'hit F1'. They sure will the first time it fails to work as expected. That, or we get another question!

  • Oh I totally understand that point Jeff. And I agree with it. The lazy side of me is just always grateful for being spoon-fed of course.

    Nonetheless this exercise has taught me about NULLIF and also SPACE(n).

  • Heh... I agree... spoonfeeding is good. One picture IS worth a thousand words. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • and I actually DID press F1 and read a little more. Because it worked I took more interest. I suspect if it took me too long to work it out myslef I may have lost interest.. particularly as it was such a "wow" moment when I saw the performance advantage!

  • Hi there,

    I am now going to demonstrate my lack of understanding on this NULLIF function but....

    Today I discovered to my surprise (see!?!) that NULLIF(field,space(0)) not only removes spaces from my data, but also Zeros. Unfortunately 0 is a valid value in my table, and if difference from NULL.

    Is there any way to limit NULLIF to removeing spaces?

    Paul

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply