In this article I’ll discuss making your data warehouse better. I’ll admit that this is a pretty obvious aspiration, but data warehouses evolve. If they are going to be a success and have some sort of longevity they have to. The focus of this article is to ensure that when you make changes that they enhance your data warehouse and it’s reputation, not diminish them.
Change is good
Change is good. A request to add functionality to your warehouse is great. However, don’t be tempted to shoehorn a change in where it doesn’t fit. You may have new requirements that mean you are considering adding measures or attributes, or adding new objects to the model. Be careful when doing this, and ask some questions:
- Will the changes enhance the user experience or degrade it? Does it make querying the model easier or harder? A simple test here can be just how many lines of SQL you need to write to get a useful query back (one based on actual user requirements), and how complex that query is. Are you having to use sub-queries, temp tables, etc. just to get a useful data set back? If you are you need to think long and hard as to whether the change is really helping the end user.
- How easy will it be to explain the new model? This is a great one to think on. You want your model to be intuitive, and ideally require no explanation to someone that understands the basic concepts of dimensional modelling. If you have to explain how and why the tables link together as they do or how to report accurate results you may well have a problem.
- Are the changes consistent with the existing content in the data warehouse or mart? Is the look and feel of the model affected by this change. If it makes it better then that’s fantastic. If not, how is this change good?
- Will the changes affect existing reports? Will they break existing reports, or even worse changing the numbers that they are reporting. Are you changing the reports but making the underlying SQL simpler to implement and understand?
- Do the changes take you further away from or closer to what your data warehouse should be (see tip #2)?
Where should I put it?
You might be tempted to add a column to a fact table. This might be valid and a quick win, but if it changes the granularity of the data or confuses the meaning of the fact table then don’t do it. You might be tempted to add another dimension table in between the existing fact/dimension relationship. You might not know where something should go, or whether it fits in anywhere at all.
You might end up with the snowball effect where one simple and seemingly innocuous change causes another change to be needed. Which in turn…
Don’t trade off the right solution for the quick solution
Everyone loves a quick fix. Life is full of them. The problem is that a lot of them don’t consider the root cause of the problem and only fix the visible symptoms. Quick fixes are synonymous with software and a lot of software projects that I’ve worked on or witnessed are one quick fix built on top of another. Each one only focused on fixing the immediate problem at hand and not considering the wider implications. This can be the beginning of the end of your data warehouse if you aren’t careful. Your data warehouse will live or die by the end user relationship with it, and by implementing short-sighted “quick wins” you may be delivering new functionality, but are you delivering what is best in the long run?
There are legitimate quick wins out there. To be able to turn around a new measure or a new dimension that gives new insights quickly and efficiently can be a big win for your data warehouse solution. However make sure you aren’t painting yourself into a corner.
(Don’t forget the ETL…)
Also be careful about the effect these intended changes have on your ETL. I’ve said before that given a choice the complexity should always live in the ETL, but if you are shoehorning something in where it doesn’t quite fit, be careful. You could turn a nice straightforward data flow into a spiders web of confusion that very few people can understand, let alone maintain.
It’s easy for me to sit here and say don’t shoehorn something in where it doesn’t seem to fit, but I know there are external pressures. If there weren’t external pressures the majority of the software solutions out there wouldn’t have seen the light of day. Without those external pressures to get something delivered developers will tinker with them until they think that they are 100% ready to deliver. Nothing is ever 100% ready to deliver. Deadlines come and you have to ship what you have, whether it’s 85% ready or 65% ready. The point I’m trying to make is that you need to make sure the changes you are making are right for your data warehouse. Sometimes it’s about saying no, but more often it’s about taking a step back and finding the best place for something to go rather than the most straightforward.
If it still doesn’t feel right, discuss it with the person that raised the issue/requirement. What exactly is it that they need? Maybe a quick conversion can change both of your perspectives on it. Maybe it’s just a case of taking a step back from it and taking some time to consider the best solution. Some of the suggestions in tip #4 – trust your instinct may help here.
Remember that your data warehouse lives or dies by the end user relationship with it. So, yes, do something that wows them or gives them some time back during the day. But don’t compromise what you already have just to save a bit of time in the development process. Don’t take what you have and make it slower and clunkier. Build that new fact table, build that new data mart. Make your data warehouse bigger, make it better, make it faster and more intuitive. And if what they want doesn’t fit in with what you already have, well make them a shiny new data warehouse. As always with these things it may take longer in the short term, but it’ll be so much better in the long run.