Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Null vs '' Expand / Collapse
Author
Message
Posted Thursday, January 3, 2013 4:33 AM
Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Today @ 2:38 AM
Points: 705, Visits: 770
Working through an ETL project at the moment and started to question some of my previous design thoughts.

I have never previously concerned myself with the data and performance as all has been reasonable but with future projects looking to be larger volumes of data I have a few concerns.

Currently I receieve a lot of low grade data so currently working on address fields. 7 Address fields and a post code field. Up to now I have just shipped out the data I have been fed. So if it is a blank space or two balnk space in the field I just send that.

If you were receiving this data and comparing it to look for matches would you prefer:

1. to receive it as it is
or
2. all data items to be NULL if the data item is just blank space(s)
or
3. Something else


Please also explain why you would prefer one option more thabn another.

My own thought is to send a NULL if it is blank as that is easy to capture, BUT to counter that when joining a NULL field with another that can give a NULL so that can cause issues as well.

Many thanks.
M
Post #1402292
Posted Thursday, January 3, 2013 5:27 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, May 30, 2014 6:27 PM
Points: 2,808, Visits: 7,175
I would prefer to have NULLS rather than '' as it is obvious that the value is blank.
Post #1402307
Posted Thursday, January 3, 2013 5:48 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:58 AM
Points: 5,330, Visits: 9,776
If you have the option to clean the data in the database itself, rather than just the extracts, I'd recommend you do that. Use NULL where the value is unknown, and blank where it doesn't exist. Then export the data as it is, so that it can easily be imported into a different system.

John
Post #1402312
Posted Thursday, January 3, 2013 5:57 AM
Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Today @ 2:38 AM
Points: 705, Visits: 770
I Import the data to a staging table that is an exact coipy of the original data (warts and all). It then goes through a transformation into output tables ready to go.

I do limited cleaning of data but havebeen thinking a lot about these fields as I know they will be used a lot for comparisons for matching so the better they are the easier it should be.

Is there a better way than just using a

case when field ='' then NULL
else field
END as field

?
Thanks
M
Post #1402317
Posted Thursday, January 3, 2013 6:03 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, August 22, 2014 9:02 AM
Points: 689, Visits: 2,779
case when field ='' then NULL
else field
END as field



NULLIF(YourColumn, ' ')




==========================================================================================================================
A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
Post #1402319
Posted Thursday, January 3, 2013 6:11 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:58 AM
Points: 5,330, Visits: 9,776
It might be more efficient to do the update in your staging or output table instead of as part of the transformation. You'll need to test to find out which way is best.

Aside from that, my advice is not to do this. NULL means unknown, so to use it to represent a known absence of a value is something of an abuse.

John
Post #1402323
Posted Thursday, January 3, 2013 6:14 AM
Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Today @ 2:38 AM
Points: 705, Visits: 770
Thank you

I knew there must be a straightforward way

M
Post #1402326
Posted Thursday, January 3, 2013 6:23 AM
Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Today @ 2:38 AM
Points: 705, Visits: 770
John Mitchell-245523 (1/3/2013)
It might be more efficient to do the update in your staging or output table instead of as part of the transformation. You'll need to test to find out which way is best.

Aside from that, my advice is not to do this. NULL means unknown, so to use it to represent a known absence of a value is something of an abuse.

John


I always use the staging as a copy of the original and never ever do anything to the data. That is a concept I hold dear to me. This way I always know exactly what I received from the source data.

In this instance I have discussed address fields so pretty much I am safe that there is not a road called '' - so safe to change to NULL. Would not consider changing the data if there was any meaning to the contents. I am also interested in doing the same with peoples names as '' has no meaning in first, middle or last name. Other data fields I am not to woriied about as a space may have some bizarre meaning and is not used for comparisons and checks.

Thanks
Mark
Post #1402333
Posted Thursday, January 3, 2013 6:30 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:58 AM
Points: 5,330, Visits: 9,776
Mark

Fair enough about the staging table - it was just a suggestion. Might still be worth considering on the output table, though.

If I may say so, the thing about second meanings isn't the point. A NULL should represent an unknown value and nothing else. Take the example about names. If you have MiddleName is blank, that means that person doesn't have a middle name. If you have MiddleName is NULL, that means you don't know what that person's middle name is, or whether they have one. See the difference? If you turn all blanks into NULLs, you lose the ability to make that distinction.

John
Post #1402335
Posted Thursday, January 3, 2013 6:42 AM
Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Today @ 2:38 AM
Points: 705, Visits: 770
With hindsight I should have wirded the question differently. I know the recipients of this data will treat '' and NULL in the same way. Other ETL projects where the data is used for reports I would leave 'as is' but this one I am pondering the change.

Cheers
M
Post #1402340
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse