SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Null vs ''


Null vs ''

Author
Message
Ells
Ells
SSCrazy
SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)

Group: General Forum Members
Points: 2512 Visits: 931
Working through an ETL project at the moment and started to question some of my previous design thoughts.

I have never previously concerned myself with the data and performance as all has been reasonable but with future projects looking to be larger volumes of data I have a few concerns.

Currently I receieve a lot of low grade data so currently working on address fields. 7 Address fields and a post code field. Up to now I have just shipped out the data I have been fed. So if it is a blank space or two balnk space in the field I just send that.

If you were receiving this data and comparing it to look for matches would you prefer:

1. to receive it as it is
or
2. all data items to be NULL if the data item is just blank space(s)
or
3. Something else


Please also explain why you would prefer one option more thabn another.

My own thought is to send a NULL if it is blank as that is easy to capture, BUT to counter that when joining a NULL field with another that can give a NULL so that can cause issues as well.

Many thanks.
M
Cool
steveb.
steveb.
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10766 Visits: 7195
I would prefer to have NULLS rather than '' as it is obvious that the value is blank.
John Mitchell-245523
John Mitchell-245523
SSC-Dedicated
SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)

Group: General Forum Members
Points: 35211 Visits: 16673
If you have the option to clean the data in the database itself, rather than just the extracts, I'd recommend you do that. Use NULL where the value is unknown, and blank where it doesn't exist. Then export the data as it is, so that it can easily be imported into a different system.

John
Ells
Ells
SSCrazy
SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)

Group: General Forum Members
Points: 2512 Visits: 931
I Import the data to a staging table that is an exact coipy of the original data (warts and all). It then goes through a transformation into output tables ready to go.

I do limited cleaning of data but havebeen thinking a lot about these fields as I know they will be used a lot for comparisons for matching so the better they are the easier it should be.

Is there a better way than just using a

case when field ='' then NULL
else field
END as field

?
Thanks
M
Andy Hyslop
Andy Hyslop
SSCrazy
SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)

Group: General Forum Members
Points: 2193 Visits: 3046
case when field ='' then NULL
else field
END as field




NULLIF(YourColumn, ' ')




==========================================================================================================================
A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
John Mitchell-245523
John Mitchell-245523
SSC-Dedicated
SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)

Group: General Forum Members
Points: 35211 Visits: 16673
It might be more efficient to do the update in your staging or output table instead of as part of the transformation. You'll need to test to find out which way is best.

Aside from that, my advice is not to do this. NULL means unknown, so to use it to represent a known absence of a value is something of an abuse.

John
Ells
Ells
SSCrazy
SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)

Group: General Forum Members
Points: 2512 Visits: 931
Thank you :-)

I knew there must be a straightforward way

M
Ells
Ells
SSCrazy
SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)

Group: General Forum Members
Points: 2512 Visits: 931
John Mitchell-245523 (1/3/2013)
It might be more efficient to do the update in your staging or output table instead of as part of the transformation. You'll need to test to find out which way is best.

Aside from that, my advice is not to do this. NULL means unknown, so to use it to represent a known absence of a value is something of an abuse.

John


I always use the staging as a copy of the original and never ever do anything to the data. That is a concept I hold dear to me. This way I always know exactly what I received from the source data.

In this instance I have discussed address fields so pretty much I am safe that there is not a road called '' - so safe to change to NULL. Would not consider changing the data if there was any meaning to the contents. I am also interested in doing the same with peoples names as '' has no meaning in first, middle or last name. Other data fields I am not to woriied about as a space may have some bizarre meaning and is not used for comparisons and checks.

Thanks
Mark
John Mitchell-245523
John Mitchell-245523
SSC-Dedicated
SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)

Group: General Forum Members
Points: 35211 Visits: 16673
Mark

Fair enough about the staging table - it was just a suggestion. Might still be worth considering on the output table, though.

If I may say so, the thing about second meanings isn't the point. A NULL should represent an unknown value and nothing else. Take the example about names. If you have MiddleName is blank, that means that person doesn't have a middle name. If you have MiddleName is NULL, that means you don't know what that person's middle name is, or whether they have one. See the difference? If you turn all blanks into NULLs, you lose the ability to make that distinction.

John
Ells
Ells
SSCrazy
SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)

Group: General Forum Members
Points: 2512 Visits: 931
With hindsight I should have wirded the question differently. I know the recipients of this data will treat '' and NULL in the same way. Other ETL projects where the data is used for reports I would leave 'as is' but this one I am pondering the change.

Cheers
M
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search