Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

A faster way to prepare dimensional databases Expand / Collapse
Author
Message
Posted Wednesday, February 24, 2010 4:29 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, October 7, 2014 2:36 PM
Points: 2,818, Visits: 2,565
Where can I find out more about the MD5 checksum?
Post #872378
Posted Thursday, February 25, 2010 1:44 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, October 10, 2014 12:58 AM
Points: 81, Visits: 904
Hi Joel,

Thanks for this idea - but can you confirm that processing large data sets in batches like this is faster overall for the entire dataset? My (admittedly hazy) memory of this approach is that the overall time taken is around the same?

THanks for the input!

Adam
Post #872521
Posted Thursday, February 25, 2010 1:51 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, October 10, 2014 12:58 AM
Points: 81, Visits: 904
Hi Magnus,

Of course, you are absolutely right that your approach is faster, but, as you point out, it reauires complete control of the entire process. Also (which I did not make clear enough in the article, so please excuse me), the appraoch that I am suggesting is very much a development technique. Our data warehouse is evolving constantly, and dimensions and attributes are being altered all the time (this is one of those projects where "Agile" means less design and planning than is healthy...) so the design is not stable enough to persist dimesnion attributes - yet.

Once we reach a more stable setup, then we will certainly adapt the process to persist unchanging dimension and fact data, and adopt the most appropriate techniques which suit the project.

Thanks for the very succinct and clear description of how to go about doing this!

Adam
Post #872526
Posted Thursday, February 25, 2010 1:54 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, October 10, 2014 12:58 AM
Points: 81, Visits: 904
Hi There Casinc835,

Thanks for the input!

After 60 -odd days and a 10% increase in volumes (and admittedly a lot of design changes) we are not seeing any notable increases in processing times, so it seems pretty linear.

Regards,

Adam
Post #872528
Posted Thursday, February 25, 2010 1:55 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, October 10, 2014 12:58 AM
Points: 81, Visits: 904
Hi Centexbi,

This sounds really interesting - I will give it a try as soon as I can.

Thanks for the idea!

Adam
Post #872529
Posted Thursday, February 25, 2010 1:59 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, October 10, 2014 12:58 AM
Points: 81, Visits: 904
Hi Magarity,

Absolutely the way to go once the process is stable and populated on an incremental basis.

Tell me, I gather from the discussions on the subject that there is only a 1 in 2^40 chance of a duplicate MD5 checksum (as opposed to the SQL Server CHECKSUM function) - have you ever experienced problems with checksums?

Thanks,

Adam
Post #872531
Posted Thursday, February 25, 2010 2:03 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, October 10, 2014 12:58 AM
Points: 81, Visits: 904
For Dbowlin:

There is a good MD5 checksum approach here:

http://www.tek-tips.com/viewthread.cfm?qid=1268144&page=1

Post #872532
Posted Thursday, February 25, 2010 2:24 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, September 10, 2012 5:38 AM
Points: 8, Visits: 116
Hi,

thanks for the article and example.

what I don't understand and this may be as you don't have complete control of the ETL or you're restricted to on the design, is why do you truncate the Dimensions? could you not just relate this to a Business Key and do appropriate SCD 1 / SCD 2 transformations?

Even with the Fact table, you could just insert for new records and update for existing records.

Regards,
Amrit
Post #872540
Posted Thursday, February 25, 2010 2:57 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, October 10, 2014 12:58 AM
Points: 81, Visits: 904
Hi Amrit,

This is explained in my reply to Magnus, above - the design is in a considerable state of flux, and we wnated complete teardown on every process run, to guarantee coherence and to avoid having "old" data cluttering up the DW.

Sorry that this wasn't clear in the article.

Otherwise yes - standard techniques could (and probably will) be used.

Regards,

Adam
Post #872555
Posted Thursday, February 25, 2010 2:58 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, September 10, 2012 5:38 AM
Points: 8, Visits: 116
Ok, great!

Sorry if I misunderstood.
Post #872556
« Prev Topic | Next Topic »

Add to briefcase ««1234»»»

Permissions Expand / Collapse