Potential presentation idea: Intro to data warehousing

  • Ray K

    SSC-Dedicated

    Points: 31016

    Hi folks...

    I'm contemplating what to do for new presentation topics, and am thinking about doing an intro to data warehousing.  It would be based on a 'blog article that I wrote a little while back.

    I will confess that I am thinking about this presentation for my own learning benefit, as well as passing knowledge along to others.

    So, I could use some feedback from the peanut gallery.  First could someone (preferably who knows more about BI/DW than I do) take a look at my article (use the above link) and let me know how far off-base I am?

    Also, any suggestions as to what I should/shouldn't include in the presentation?

    Thanks!

    +--------------------------------------------------------------------------------------+
    ‌Check out my blog at https://pianorayk.wordpress.com/[/url]

  • Eric M Russell

    SSC Guru

    Points: 124938

    I'd say the blog post is a good five minute introduction to data warehousing for an audience who has never heard of the topic, but it may be a tad bit too introductory for a one hour talk at a SQL Server user group meetup. For example, most DBAs, if they have never been involved in the design or implementation, at least know what a data warehouse is and have at least heard that the table model can be denormalized and the rows summarized. What's worth getting across to an audience of DBAs is the Why and How of denormalization and summarization. Without covering fact table granularity and slowly changing dimensions, many in the audience would be left with impression that a data warehouse is simply a copy of the transactional database modeled into a star schema pattern, which is actually not accurate or useful.

    Ralph Kimball describes 3 basic grains for a fact table: transactional, periodic snapshot, accumulating snapshot. Illustrate graphically how the columns for each type of granularity would differ and why one type would be chosen over another depending on the end user requirements and volume of data.
    https://www.kimballgroup.com/2008/11/fact-tables/

    I also wouldn't approach the subject of ETL (much less SSIS) other than to illustrate how dimensional tables are typically versioned and referred to as slowly changing dimensions (SCD) and categorized into various types (basically 1 - 3) which dictate how that versioning is implemented. 
    https://www.kimballgroup.com/2005/03/slowly-changing-dimensions-are-not-always-as-easy-as-1-2-3/

    The first three chapters of Kimball's book "The Data Warehouse Toolkit" are golden.
    https://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Lynn Pettis

    SSC Guru

    Points: 442094

    Eric M Russell - Monday, September 10, 2018 12:58 PM

    I'd say the blog post is a good five minute introduction to data warehousing for an audience who has never heard of the topic, but it may be a tad bit too introductory for a one hour talk at a SQL Server user group meetup. For example, most DBAs, if they have never been involved in the design or implementation, at least know what a data warehouse is and have at least heard that the table model can be denormalized and the rows summarized. What's worth getting across to an audience of DBAs is the Why and How of denormalization and summarization. Without covering fact table granularity and slowly changing dimensions, many in the audience would be left with impression that a data warehouse is simply a copy of the transactional database modeled into a star schema pattern, which is actually not accurate or useful.

    Ralph Kimball describes 3 basic grains for a fact table: transactional, periodic snapshot, accumulating snapshot. Illustrate graphically how the columns for each type of granularity would differ and why one type would be chosen over another depending on the end user requirements and volume of data.
    https://www.kimballgroup.com/2008/11/fact-tables/

    I also wouldn't approach the subject of ETL (much less SSIS) other than to illustrate how dimensional tables are typically versioned and referred to as slowly changing dimensions (SCD) and categorized into various types (basically 1 - 3) which dictate how that versioning is implemented. 
    https://www.kimballgroup.com/2005/03/slowly-changing-dimensions-are-not-always-as-easy-as-1-2-3/

    The first three chapters of Kimball's book "The Data Warehouse Toolkit" is golden.
    https://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802

    And realize that there are several different models when it comes to data warehousing: the dimensional data warehouse (Kimball), enterprise data warehouse (Inmon), and the data vault (Lindquest).  There are probably others as well, but these are the three I am most aware of at this time.

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 714105

    Eric has good points. This can be tough in an hour. I think even a star schema is tough in an hour, but perhaps you can just cover higher level concepts  about what you include and some ideas of how without details?

  • Eric M Russell

    SSC Guru

    Points: 124938

    Lynn Pettis - Monday, September 10, 2018 1:41 PM

    ...
    And realize that there are several different models when it comes to data warehousing: the dimensional data warehouse (Kimball), enterprise data warehouse (Inmon), and the data vault (Lindquest).  There are probably others as well, but these are the three I am most aware of at this time.

    From what I see in the blog post, he's presenting fact and dimension tables, a basic star schema DW model, so Kimball the authoritative source of reference.

    For this specific presentation, an introduction to data warehousing, I wouldn't touch SSIS, just stick to the concepts and PowerPoint slides. There are probably a lot of DBAs who manage a data warehouse (in addition to 100 other servers) but don't understand why the BI team designed the DW that way. For example, they may notice that the tables appear denormalized and have rows duplicated on key columns that are unique in the OLTP database.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Chris Harshman

    SSC-Forever

    Points: 41616

    getting into Kimball vs Inmon vs etc. is probably too much for something that's going to be an hour presentation.  I'd imagine you'd pick the one you're most familiar or comfortable with, such as Kimball dimensional models as you seem to be leaning already, and maybe just mention a couple points of the advantage of this method over the relational Inmon method.

    The key points I think are really in that Lifewire link in the blog, talking about modeling a business process in your fact table at either a grain of one record per business transaction or one record per summarized set of transactions.  The dimensions are descriptive information about the facts.  If you need to track how that descriptive information changes over time, then that's when a slowly changing dimension comes into play.  (e.g. a store location may move to a different address, so you would want to be able to differentiate which sales transactions were for the original location vs the new location)

  • xsevensinzx

    One Orange Chip

    Points: 25529

    I would also refine you definition of a data warehouse. Saying it's just a repository for data can mean anything. You need to really emphasize how the data warehouse is pooling more than one disparate data sources and cleaning as well conforming them into one. Then drive it home with common use cases to further refine the definition such as being used for analyzing and reporting for one common example. I have data stores that are just repositories for data and contain more than one data source. It does not make it a data warehouse nor does it replace the need for a data warehouse.

    For something a bit more fresh to the presentations over others I've seen would touch more on the differences between the data warehouse, the data mart, and the data lake without extreme bias. In meaning, you kind of get two flavors when people talk about it. Those who are for data warehousing and against data lakes or those for data lakes and against data warehousing. We need more people talking about either or without playing favorites. Like, why can't you just use a data lake over a data warehouse or why you may still need data warehousing with data lakes, pro's and con's on top of again, common use cases.

  • Ray K

    SSC-Dedicated

    Points: 31016

    Great feedback, everyone!  Keep 'em coming.

    The Ralph Kimball links look great.  For the moment, lack of time has kept me from doing much more than give them a quick skim (if that).  They do look like great material, and I'll definitely set some time aside to absorb them some more.

    I confess that I'm not much more than a novice myself, so I'm trying to learn what I can about DWs and BI.  My line of thinking about doing a presentation -- "DW for Dummies" is my working title -- is not just to educate other people, but also to educate myself as well.  If there's one thing I've learned professionally, it's that one of the best ways to learn about a subject is to write about it -- or in this case, present about it.

    I think Steve pretty much hit on it when he suggested keeping it high-level; that was pretty much the approach I was thinking, anyway.  I haven't yet reached the level of expertise where I feel like I can get granular with DW/BI topics, but I do feel like I'd be able to at least provide an overview from 30,000 feet.

    +--------------------------------------------------------------------------------------+
    ‌Check out my blog at https://pianorayk.wordpress.com/[/url]

  • Eric M Russell

    SSC Guru

    Points: 124938

    Here is an example of a good introduction to data warehousing by, Jack Zheng, a Kennesaw State University professor. Within only a few slides, he not only provides an overview of the fundamentals but also places it into context with the OLAP presentation side of things.
    http://jackzheng.net/teaching/it4153/files/12.dw-olap.pdf

    If it helps, there is a Data Warehouse version of the AdventureWorks sample database. You can find a download for all versions and varients of AdventureWorks here:
    https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks
    Also, here is a data model diagram:
    http://www.merc.tv/img/fig/AdventureWorksDW2008.pdf

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply