Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


A faster way to prepare dimensional databases


A faster way to prepare dimensional databases

Author
Message
Daniel Bowlin
Daniel Bowlin
Hall of Fame
Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)

Group: General Forum Members
Points: 3254 Visits: 2629
Where can I find out more about the MD5 checksum?
Adam Aspin
Adam Aspin
SSC Veteran
SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)

Group: General Forum Members
Points: 243 Visits: 1032
Hi Joel,

Thanks for this idea - but can you confirm that processing large data sets in batches like this is faster overall for the entire dataset? My (admittedly hazy) memory of this approach is that the overall time taken is around the same?

THanks for the input!

Adam
Adam Aspin
Adam Aspin
SSC Veteran
SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)

Group: General Forum Members
Points: 243 Visits: 1032
Hi Magnus,

Of course, you are absolutely right that your approach is faster, but, as you point out, it reauires complete control of the entire process. Also (which I did not make clear enough in the article, so please excuse me), the appraoch that I am suggesting is very much a development technique. Our data warehouse is evolving constantly, and dimensions and attributes are being altered all the time (this is one of those projects where "Agile" means less design and planning than is healthy...) so the design is not stable enough to persist dimesnion attributes - yet.

Once we reach a more stable setup, then we will certainly adapt the process to persist unchanging dimension and fact data, and adopt the most appropriate techniques which suit the project.

Thanks for the very succinct and clear description of how to go about doing this!

Adam
Adam Aspin
Adam Aspin
SSC Veteran
SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)

Group: General Forum Members
Points: 243 Visits: 1032
Hi There Casinc835,

Thanks for the input!

After 60 -odd days and a 10% increase in volumes (and admittedly a lot of design changes) we are not seeing any notable increases in processing times, so it seems pretty linear.

Regards,

Adam
Adam Aspin
Adam Aspin
SSC Veteran
SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)

Group: General Forum Members
Points: 243 Visits: 1032
Hi Centexbi,

This sounds really interesting - I will give it a try as soon as I can.

Thanks for the idea!

Adam
Adam Aspin
Adam Aspin
SSC Veteran
SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)

Group: General Forum Members
Points: 243 Visits: 1032
Hi Magarity,

Absolutely the way to go once the process is stable and populated on an incremental basis.

Tell me, I gather from the discussions on the subject that there is only a 1 in 2^40 chance of a duplicate MD5 checksum (as opposed to the SQL Server CHECKSUM function) - have you ever experienced problems with checksums?

Thanks,

Adam
Adam Aspin
Adam Aspin
SSC Veteran
SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)

Group: General Forum Members
Points: 243 Visits: 1032
For Dbowlin:

There is a good MD5 checksum approach here:

http://www.tek-tips.com/viewthread.cfm?qid=1268144&page=1
amritpal.parmar
amritpal.parmar
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 116
Hi,

thanks for the article and example.

what I don't understand and this may be as you don't have complete control of the ETL or you're restricted to on the design, is why do you truncate the Dimensions? could you not just relate this to a Business Key and do appropriate SCD 1 / SCD 2 transformations?

Even with the Fact table, you could just insert for new records and update for existing records.

Regards,
Amrit
Adam Aspin
Adam Aspin
SSC Veteran
SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)SSC Veteran (243 reputation)

Group: General Forum Members
Points: 243 Visits: 1032
Hi Amrit,

This is explained in my reply to Magnus, above - the design is in a considerable state of flux, and we wnated complete teardown on every process run, to guarantee coherence and to avoid having "old" data cluttering up the DW.

Sorry that this wasn't clear in the article.

Otherwise yes - standard techniques could (and probably will) be used.

Regards,

Adam
amritpal.parmar
amritpal.parmar
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 116
Ok, great!

Sorry if I misunderstood.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search