• The data warehouse is like any other traditional warehouse. But, the data marts have to be rebuilt nightly due to the business needs for reporting. Unfortunately, not every company has the same reporting needs. The company I work for does intense attribution based on time. That means, as new data comes into the warehouse, then existing data has to be rebuilt around the new data that comes into the data mart.

    The easiest way to explain this is that our data is split between sales and non-sales. Reporting is done on either one or the other. Because of the intense computing need to transform, conform and load the data, data marts are used to look at chunks of the data by the end user. And for that, those chunks are rebuilt every night.

    Why they are rebuilt is based on the fact that non-sales could eventually turn to sales. Every record in our environment relates to the other eventually. Therefore, we group and reclassify these records when this transformation happens. Due to that, intense backlog of older records are needed to help with the transformation of new records.

    This can be done without a rebuild, but the entire data mart is reorganized when each set of new data. There are also retention rules in place where data has to be cycled out. As these are chunks and not the full data set, it's just easier to rebuild those chunks along with the indexes and everything else in a complete batch process over night. That way the cycle is fully controlled, fresh and ready to go every morning.

    Then the team can aggregate the final results.