SQL/SSIS/DTS

  • I have some questions and hope you can help to answer...trying to get into DBA world as a developer. Thanks in advance for all your help!

    1. When you have several external databases with the same layout, how would you move them into a datawarehouse?

    2. What method would you use to load data into datawareshouse with existing table that has index/constrains?

    3. what benefit and why would you denormalize a table?

  • These are all rather complex questions with no simple answers. I suggest you start looking at SQL Server books to build up your knowledge. Once you do that, the answers to these questions will most likely become apparent.

    Josef Richberg
    2009 Exceptional DBA
    http://www.josef-richberg.squarespace.com
    http://twitter.com/sqlrunner

  • I would have to agree with SQL Runner. Those are some broad questions. I would recommend picking up The Data Warehouse Lifecycle Toolkit by Ralph Kimball et al. That will be a great start into the world of DW/BI.

    From an incredibly high level I would answer your questions with:

  • Define what business process you want to report on. It is critical you look at this from a business process perspective and NOT from a db or table perspective.
  • Then you will need to define your Fact and Dimension tables.
  • Then you will map out an ETL process that will execute via SSIS.
  • Once you have all of that done you can decide if you want to proceed further with building cubes in SSAS or maybe you just want to use SSRS to build some reports.
  • This is an incredibly oversimplified answer, but the journey will be fun.

  • 1. Why would you try moving (or) migrating external databases into a warehouse? If it is to extract and load data from individual databases into a Warehouse, then you should develop ETLs using DTS or SSIS for that.

    2. It depends upon the layer. If it is a staging layer, there would be no index / constraints on the objects. Incase of a warehouse layer, this could be done using normal ETLs.

    3. Denormalisation is done usually in a fact table to reduce the number of joins and try having consolidated data within a single table.

  • trishdiep (7/14/2010)


    I have some questions and hope you can help to answer...trying to get into DBA world as a developer. Thanks in advance for all your help!

    1. When you have several external databases with the same layout, how would you move them into a datawarehouse?

    2. What method would you use to load data into datawareshouse with existing table that has index/constrains?

    3. what benefit and why would you denormalize a table?

    Welcome to the rabbit hole.

    1.You will need to become familiar with Staging Databases and Operational DataStores (ODS). These both serve a role when 'combining' several (or 256) external database prior to loading/updating the Warehouse and refreshing your Cube(s).

    2.Indexes and Constraints serve a multitude of rows, especially when you need to sort and/or lookup data as you denormalize. To what extent will be determined by the specific need(s) of what you are looking to accomplish.

    3.This can be a fun debate to get into. There are some who believe that all of your data needs to be denormalized in order to load the Warehouse and effectively maintain your Cube(s). There are some who like to be more creative in how their data gets processes or are forced to when time/space restrictions are tight. As you are new to this type of development, I suggest denormalizing all of your data into a Staging database, then loading what you need into your Warehouse. This may seem like redundant work, but it should make it easier for you to understand the data flow as you create your Warehouse.

    Hope this helps.

  • Thanks everyone for your responses! Much appreciated!

    I went for an interview and it seems like they were looking for a different answers...so I just want to see your responses on these topics...

    To dhudrik:

    Thanks for your detailed explanation and definitely learn from it. Some folks actually say it's not good to practice the normalization...so I'm trying to get good practice and I am confused...Normalization in terms of merging some small tables that have redundant data into big tables, so doing that we could be deleting one of the table but then remember there are other applications/store procedures, components, external tools point to that table (that we just delete it), then an error will occur. We then have to worry about the clean up, so it's more risk than the benefit to do the normalization...Well I'm sure other folks have their reasons to do normalization in this aspect...

    Thanks for all your responses! more thoughts?

  • In most cases normalization is the proper way to design your system. It used to be that normalization was too resource intensive on database systems, but that is no longer the case. Most people go to third normal form and I suggest you follow that. There are situations where denormalizing will improve performance, but those are few and far between.

    Josef Richberg
    2009 Exceptional DBA
    http://www.josef-richberg.squarespace.com
    http://twitter.com/sqlrunner

  • I don't know if the reasons are few and far between, but I think the more important issue is understanding WHY you would chose to go either more or less normalized. The 3NF (third normal form) is VERY typical and works for a large cross-section of needs. There are very few if any hard and fast rules in databases, I can think of reasons to break nearly all of them, but the issue is understanding what the implications will be. Given that there is always at least 2 ways to do just about anything I don't want to break one of the rules just because I can't think of anything else.

    CEWII

  • There are very few if any hard and fast rules in databases, I can think of reasons to break nearly all of them,

    These kinds of statements make me nervous. I have seen too many unnormalized databases that were caused by lack of knowledge or even a desire to reduce the joins that would be required in queries, only to have the designs cause data integrity problems later. I have denormalized desigsn a handful of times, but it was only to address severe performance problems that could not be resolved another way.

  • RonKyle (8/7/2010)


    There are very few if any hard and fast rules in databases, I can think of reasons to break nearly all of them,

    These kinds of statements make me nervous. I have seen too many unnormalized databases that were caused by lack of knowledge or even a desire to reduce the joins that would be required in queries, only to have the designs cause data integrity problems later. I have denormalized desigsn a handful of times, but it was only to address severe performance problems that could not be resolved another way.

    I agree, in general it should make you nervous.

    Design should generally go to the 3NF. If you depart from that, you should think long and hard about it. There are reasons, but full understanding of the ramifications comes with experience. In many cases it makes sense to design for 3NF and build a reporting database that is more denormalized. The 3NF is usually a pretty good balance between OLTP and reporting. There are tradeoffs.

    CEWII

  • Viewing 10 posts - 1 through 9 (of 9 total)

    You must be logged in to reply to this topic. Login to reply