Learn about data in a datawarehouse

  • i have heard people say that there are no silly question so I have decided to ask a question. How does anyone learn about the data in a datawarehouse if there are no relationships (PK,FK) between tables. I just started a new position and I am trying to understand what sort of data is coming in from multiple sources and what the relationship is. There is no proper documentation and no data modeling tool.

  • There is no easy way to help you understand the ER. In a DW setting, chances are that you will see a typical set of queries that run frequently. You may ask the person who own those queries, or deduce the relationship among the tables within those queries.

    However, if you are the DBA new-born, don't worry about the relationships. Keep the data safe and index healthy - I know, this is a silly answer.

  • Hopefully the developers have used a standard naming convention to allow you to see keys easily.   SQL Prompt is a helpful tool too as it help you write the ON clause.

    Schema docs can help but only if there is a good naming convention or if foreign keys have been created.

  • You can start by picking up some books on the subject. Many suggest Kimball's Data Warehouse Toolkit series to start. I've learned a lot from them and highly recommend them even if you don't approach all of his methods.

    In your example, like any new data, you have to explore it and find the relationships if they are not defined. For example, I work in advertising and marketing. This is the domain. Most all the data feeds have some type of relationship to that domain in the sense that all data feeds often may also have a similar relationship. It's not exactly rocket science to take two common dimensions and say, "Well, these both say Campaign in their name."

    The issue that most fall into is taking that for face value versus knowing there is difference between Campaign Name and Paid Search Campaign Name. One may represent multiple channels such as Paid Search on Google, Display banners on websites, or Social Media paid advertising. The other only represents one of those channels. Thus treating them as the same field/dimension of data would make it entirely wrong.

    That's when understand the best approaches of identifying, linking, and learning about the data. A lot of legwork that is even covered in those books as well things not covered that are just common sense. Like in my example, simply asking the domain experts for help in understanding the data so you can create your own documentation and planning to build a proper data warehouse that hopefully adds some business value.

  • This was removed by the editor as SPAM

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply