Working through an ETL project at the moment and started to question some of my previous design thoughts.
I have never previously concerned myself with the data and performance as all has been reasonable but with future projects looking to be larger volumes of data I have a few concerns.
Currently I receieve a lot of low grade data so currently working on address fields. 7 Address fields and a post code field. Up to now I have just shipped out the data I have been fed. So if it is a blank space or two balnk space in the field I just send that.
If you were receiving this data and comparing it to look for matches would you prefer:
1. to receive it as it is
2. all data items to be NULL if the data item is just blank space(s)
3. Something else
Please also explain why you would prefer one option more thabn another.
My own thought is to send a NULL if it is blank as that is easy to capture, BUT to counter that when joining a NULL field with another that can give a NULL so that can cause issues as well.