|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Today @ 1:10 PM
Points: 2,673,
Visits: 2,418
|
|
| Where can I find out more about the MD5 checksum?
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Today @ 2:10 AM
Points: 81,
Visits: 848
|
|
Hi Joel,
Thanks for this idea - but can you confirm that processing large data sets in batches like this is faster overall for the entire dataset? My (admittedly hazy) memory of this approach is that the overall time taken is around the same?
THanks for the input!
Adam
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Today @ 2:10 AM
Points: 81,
Visits: 848
|
|
Hi Magnus,
Of course, you are absolutely right that your approach is faster, but, as you point out, it reauires complete control of the entire process. Also (which I did not make clear enough in the article, so please excuse me), the appraoch that I am suggesting is very much a development technique. Our data warehouse is evolving constantly, and dimensions and attributes are being altered all the time (this is one of those projects where "Agile" means less design and planning than is healthy...) so the design is not stable enough to persist dimesnion attributes - yet.
Once we reach a more stable setup, then we will certainly adapt the process to persist unchanging dimension and fact data, and adopt the most appropriate techniques which suit the project.
Thanks for the very succinct and clear description of how to go about doing this!
Adam
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Today @ 2:10 AM
Points: 81,
Visits: 848
|
|
Hi There Casinc835,
Thanks for the input!
After 60 -odd days and a 10% increase in volumes (and admittedly a lot of design changes) we are not seeing any notable increases in processing times, so it seems pretty linear.
Regards,
Adam
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Today @ 2:10 AM
Points: 81,
Visits: 848
|
|
Hi Centexbi,
This sounds really interesting - I will give it a try as soon as I can.
Thanks for the idea!
Adam
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Today @ 2:10 AM
Points: 81,
Visits: 848
|
|
Hi Magarity,
Absolutely the way to go once the process is stable and populated on an incremental basis.
Tell me, I gather from the discussions on the subject that there is only a 1 in 2^40 chance of a duplicate MD5 checksum (as opposed to the SQL Server CHECKSUM function) - have you ever experienced problems with checksums?
Thanks,
Adam
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Today @ 2:10 AM
Points: 81,
Visits: 848
|
|
For Dbowlin:
There is a good MD5 checksum approach here:
http://www.tek-tips.com/viewthread.cfm?qid=1268144&page=1
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, September 10, 2012 5:38 AM
Points: 8,
Visits: 116
|
|
Hi,
thanks for the article and example.
what I don't understand and this may be as you don't have complete control of the ETL or you're restricted to on the design, is why do you truncate the Dimensions? could you not just relate this to a Business Key and do appropriate SCD 1 / SCD 2 transformations?
Even with the Fact table, you could just insert for new records and update for existing records.
Regards, Amrit
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Today @ 2:10 AM
Points: 81,
Visits: 848
|
|
Hi Amrit,
This is explained in my reply to Magnus, above - the design is in a considerable state of flux, and we wnated complete teardown on every process run, to guarantee coherence and to avoid having "old" data cluttering up the DW.
Sorry that this wasn't clear in the article.
Otherwise yes - standard techniques could (and probably will) be used.
Regards,
Adam
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, September 10, 2012 5:38 AM
Points: 8,
Visits: 116
|
|
Ok, great!
Sorry if I misunderstood.
|
|
|
|