Data Warehouse - Design Questions

  • I'm designing a data warehouse system, and have a question over the structure. First, here is some background into it (financial services):

    There are a number of fee definitions defined (transaction fee, setup fee, SMS reminder fee and so on). These fees can be updated and added (Dimension table).

    There are payments defined that use a fee definition. A payment can contain 1 or more fee's, and the fee amount (and who pays) is unique to that payment (a transaction fee will be vastly different on a $2 transaction to that of a $2,000 transaction).

    If I was designing an OLTP database, I would have the following:

    Payments (payment details)

    PaymentFees (FK to fees, FK to payment, fee amount, fee payee)

    Fees (fee name etc)

    My question is, how should I design this in a DW? I'm going to use SSAS for cube browsing, and I've heard that it doesn't handle many to many particularly well.

  • Paul

    If I am not mistaken, it sounds like you are attempting to put transactional event data (DW Fact Table data) into a dimension table.

    The first question you must ask yourself when creating a data warehouse is, "What event am I capturing?" The second question is, "What is the granularity of the event?" From what I can tell, your OLTP transaction consists of fees. If so, then it is part of the "counts, amounts, and timestamps" that go into the fact table.

    Your dimension tables will hold fairly stable supporting data: date/time breakdowns, fee descriptions, payee lookup data, and other payment details. These can be Type 1, 2, or 3, depending on the business requirements. Sometimes they can be "monster dimensions" that can rival the fact table in size. Nevertheless, dimension table data will be supporting data, not the actual event data. Avoid creating dimensions off of dimensions. This is "snowflaking". The best practice in the industry is to design for the star schema approach to dimensional modeling. With this approach, you can have a chance for half-way decent performance. With a snowflake model, your users will probably chase after you with a lawn mower. The performance will suck.

    For what it is worth.

    Jim

  • When I first began reading up on data warehouses and dimensional modeling, one thing that stuck out was that facts are numbers (not always, but nearly always). The other stuff is usually a dimension - something that describes or gives meaning to the fact. Fact tables wind up effectively being many-to-many joins between the various dimensions.

    That might be a bit simplified, but it serves as a good thumb rule (for me, anyway) for figuring out what should probably go where.

    ____________
    Just my $0.02 from over here in the cheap seats of the peanut gallery - please adjust for inflation and/or your local currency.

  • Fact tables wind up effectively being many-to-many joins between the various dimensions.

    I don't think it's oversimplified at all. I think you got it.

  • The only thing I would want to clarify is the statement: "Fact tables wind up effectively being many-to-many joins between the various dimensions." In a sense, yes. But it would probably be better to say, "Fact tables have a multitude of one-to-many relationships to dimension tables." The power in most of the DBMSes to process the vast amount of data in a star schema design is predicated on one fact table and it having many one-to-one relationships with its dimensions. Sometimes the DBMS optimizer will choose a cartesian join because it ends up being more efficient in a star schema. Where this breaks down is when you have monster dimensions that appear to the optimizer to be other fact tables. Also, a cardinal rule in data warehouse efficiency is to never join one fact table to another fact table. If you do, all bets are off for getting a response back before retirement.

  • Also, a cardinal rule in data warehouse efficiency is to never join one fact table to another fact table. If you do, all bets are off for getting a response back before retirement.

    What do you mean by this? Behind the scenes, I will set the FK constraints between a natural parent children (ie orders and order details) because I want to eliminate the chance details will appear that don't have a parent. It shouldn't happen, but I want to guarantee it.

    If you mean that you shouldn't have multiple fact tables in a cube, then you may be giving up some analysis possibilities. Kimball even suggests that a five or six fact table cube is likely very possible.

    Or do you mean something else?

    Thanks,

  • I wasn't really addressing cube performance, per se. I was addressing generic physical DW design across most DBMS platforms. I am making an assumption based on the original question that this financial services company's fact table could easily be multi-terabyte in size. I have previously worked for a financial services company and designed data warehouses of this size (even bigger if you have BLOB data). Whether the front end to the DW is Micro-Strategy, Cognos, or native in-house designed SQL, you really don't want to join 2+ of these types of fact tables together. It will be a performance killer on the biggest machines. Beyond this, I don't have experience with cubes other than what I have seen in demos. But I would imagine to populate the cube from the data warehouse, you would probably not want to join two large fact tables. If they are smaller, then yes, I think the performance could be acceptable.

  • Just another clarification. When working in financial services, we had a security/audit policy that cube processing had to be done on the server, not on the client. Auditors were concerned about downloading that much sensitive data to a workstation. I imagine that this is true across all financial services companies.

  • In general.........

    Facts are the events, or VERBS of your business.

    Dimensions are the entities, or NOUNS in your business.

    Many to many relationships can occur in two flavors:

    1. Relating a dimension to a dimension (i.e. customers in an account)

    2. Relating a fact to a multivalued dimension (i.e. more than one sales reason for a transaction)

  • Although written for SSAS 2005, this white paper remains very relevant and should provide some nice guidance on how to handle M2M relationships in SSAS......

    http://www.microsoft.com/en-us/download/confirmation.aspx?id=137

    Best of luck 🙂

    Steven Neumersky

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply