Could use some advice

  • OOPS: I didn't realize I was in the SQL Server 2005 section of the forums, will be looking for a better place to cross post this.

    Hey everyone,

    I am a one-man BI shop for an organization with 600 employees. The organization is a tribal government, so there is a wide diversity of data sources, reports, and dashboards that need to be done. We are using the Microsoft BI stack (SQL Server, SSRS, SSAS, Datazen, PerformancePoint) to deliver the BI to the end users.

    The end users here don't have a whole lot of exposure to BI so I get the "luxury" of essentially building the BI department from the ground up. So I built a couple of small data warehouses, and now the demand is growing quicker than I can service it by building small data marts. My next step is to build a centralized, tribal-wide data warehouse, but I also want to implement a standard error logging and data quality mechanism that supports custom business logic validation.

    I figured that it would help me out quite a bit to build a template, and one of the vendors we are working with on a project suggested a meta-data driven template based system. I prefer to keep my ETL in T-SQL as opposed to SSIS (with the exception of pre-staging data from data sources such as SharePoint, flat files, and non Microsoft databases).

    The big problem I am facing with all that is that I am having a hard time keeping my thoughts organized, and am constantly second guessing myself. In my head I envision something like this:

    For each dim, having a stored proc to extract the data from the staging environment, process quality checks, write the failed rows to a data quality table, write the good rows to the dim table, and log any errors in an error table. Seems simple enough right? Yet everytime I open SSMS to start writing the procs I go full on stupid and completely freeze up because my mind races so far ahead of where I am at in the project.

    How do you guys plan out a project like this and document the design so that you can keep your head on straight while coding?

    Thanks in advance!

  • So, the question is how to keep the overall design straight in your head while you're developing?

    I'd suggest reading Ralph Kimball's book "The Data Warehouse Toolkit". If you haven't already, then stop whatever you're doing and read this book.

    http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-toolkit/

    In it he describes the process of building a warehouse, starting with the basic concepts and then going from requirements gathering to implementation. He advocates creating what he calls a "Bus Matrix", which maps business processes to dimensions. That's probably the first design document you want to work on.

    Data Model diagrams and a Data Flow diagram for each ETL process are helpful of course. Whatever I'm currently working on, I'll have the relevant model and flow diagram pinned up on the wall nearby or displayed on a 2nd monitor. Sometimes I'll even have the diagrams embedded with file links that open the related documentation or source files.

    Also, even if you prefer to implement most of your ETL process in T-SQL, it helps to call these tasks from SSIS. Even if your package consists of nothing but T-SQL tasks glued together with connectors and precedence constraints, SSIS helps to visualize the flow of your data, how it all ties together, and it has some useful auditing features.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I agree with Erik. It's impossible to keep large or complex plans in your head. You have to write them down and modify the plans when changes occur or you'll drive yourself nuts. I tend to use flowcharts and process charts made in Visio but just about anything (cocktail napkins, included) written in a basic pictorial format is worth it's weight in gold. I also keep track of my progress in the flow/process charts. Items that need work are, of course, in White. Things that I'm currently working on (usually just one or two, can't peel the whole bag of potatoes at once) are in Blue. Items that I've completed coding for and in the process of unit testing are in Purple. Items that I've completed coding and unit testing are in Green.

    One of the keys to success with such charts is to realize that you don't necessarily have to have everything planned out perfectly before you start. Use placeholders with high level descriptions and expand on those as necessary when you get to them.

    I know a whole lot of people are going to think that flow/process charts are a bit old fashioned but it's an incredible bit of visual help not to mention keeping track of what's been done so you no longer have to worry about them. When you're done, don't throw the charts away because they make incredible documentation that will make your life super easy when it comes to major modifications because you'll no longer need to search for the modules you need to make changes in. Just look at the charts and you can quickly plan out what needs to be done. And, since you already have the chart open, it takes just a couple of minutes to update it so that it doesn't go out of date and remains useful.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the tips. My end goal is to automate the creation of as much as possible to ensure that everything remains standard and to free up as much of my time as possible for working on the end user facing pieces. The first step then is to get all my thoughts on paper, and since there is nothing in place currently, I have to come up with everything.

    I am reading through The Data Warehouse Toolkit, 3rd edition right now, and on deck are The Data Warehouse ETL Toolkit and The Data Warehouse Lifecycle Toolkit. Hopefully these books will give me a solid foundation to build this new BI department on.

  • Apart from what has been suggested, I try to solve things like this for one area. Write down the plan for one dimension or entity and solve each one. Then look to see if you can automate the generation of that one plan for other entities.

    I know you want to be in T-SQL, but this is what Biml is geared towards. Might be worth spending a day looking at things and see if it works for you.

    http://www.sqlservercentral.com/stairway/100550/

  • The Kimball methodology advocates first interviewing all essential stake holders, and then identifying (as much as practical) all business processes and common dimensions. He emphasizes modeling business processes, not departments. Read the book to see why that's important. Requirements are gathered into the Bus Matrix, which is typically something like an Excel sheet that maps process to dimension. You then model the first iteration of each business process one at a time in a SCRUM like fashion.

    That's what's ideal, when you're part of a team and assuming the functional role of either business analyst, architect, or developer. However, like yourself, I've been in a one-man-show position myself. The concepts of requirements gathering, iterative design, and taking a methodical approach to dimensional modeling in general all still apply; maybe even more so, because it's either documented or just in your lonesome head, and there is a lot of stuff for one guy to keep track of.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • This may help because I run into similar time-to-time.

    Right now, I'm rebuilding my data warehouse from the ground up. I've identified a number of weak areas and I'm going to improve them with some new designs and new theory that I've been wanting to implement for some time.

    Some things that have helped me is plenty of whiteboard space. I normally go to a quiet room and just brainstorm ideas on whiteboards such as data models, data flows, doodles of my co-workers, you name it. I just write ideas to axe the ones I hate and keep the ones I want.

    From there, I used to document and plan a bit. I would create diagrams, visio work flows and other things even suggested in the Data Warehousing Toolkit by Ralph Kimball.

    But unfortunately, all the strategic planning, documenting and so forth drown me out a lot. I found myself doing more planning than actually working.

    So, what has helped me the most is just winning small battles. I start working on the fragments of the bigger picture. I get something at least completed, and it motivates and provides clarity for the next fragment. Eventually, I end up with a big piece of the pie completed and I feel a lot better about the progress.

    This isn't for everyone, but I found that just doing something, even if you scrap it helps a lot. This is similar to writers who may not know where to begin. Just start writing, anything, everything and maybe something will catch fire.

  • xsevensinzx (10/8/2015)


    But unfortunately, all the strategic planning, documenting and so forth drown me out a lot. I found myself doing more planning than actually working.

    As someone who created embedded software solutions for many years and now is creating data solutions for business, I firmly believe that your planning time should be more than implementing. The notion that design is not 'working' is a false. Design is harder work than implementation often, because it takes deep thought and a willingness to explore many ideas. While Agile ideas get things out quickly (good for the business maybe) my experience has shown that the overall design suffers usually and re-design iterations ensue. This might work well for websites but I don't believe it's the way to go with data modeling and data warehouse implementation. The OP is experiencing confustication because he's trying to eat the proverbial elephant all at once, when he can't even see that it's not just one elephant that needs to be eaten; it's a herd! Step back, outline your goals, then start whiteboarding your solutions one piece at a time. Only when you're satisfied with the overall design and ensured it's future-proofness (as extensible as possible) then begin implementation, one subject area at a time.

  • Thanks everyone for the all the advice!

    I just finished the paperwork for my boss, and she agreed to a process by process implementation across the organization. She will work with the division directors to prioritize the process, and has assigned a business analyst to serve as my liaison to the end users, so I can focus my time on implementation.

    I plan to start slowly, and build a single unified data warehouse to house all the process, then use secondary SQL databases filled with views to expose certain data sets to specific end users, for example, accounting users versus social services users. This should make securing the data cleaner and more straightforward.

    Thanks again everyone 🙂

  • jesse.seymour (9/30/2015)


    The big problem I am facing with all that is that I am having a hard time keeping my thoughts organized, and am constantly second guessing myself. In my head I envision something like this:

    For each dim, having a stored proc to extract the data from the staging environment, process quality checks, write the failed rows to a data quality table, write the good rows to the dim table, and log any errors in an error table. Seems simple enough right? Yet everytime I open SSMS to start writing the procs I go full on stupid and completely freeze up because my mind races so far ahead of where I am at in the project.

    How do you guys plan out a project like this and document the design so that you can keep your head on straight while coding?

    Thanks in advance!

    I am afraid you are dealing with the political/organisational issue more than the technical or even project management issue.

    The reason why I would be confused this way could be that:

    - BI team is understaffed.

    - DWH/BI work is poorly understood by the business (two months for one report? are you joking?)

    - I do not understand the business. I think they expect me to deliver super-fast

    - I over-promised to the business and then realised that have no time.

    - I am a bit of a perfectionist (struggle finding a balance between the speed of delivery and the quality of code)

    - I am not aware of multiple ways of delivering the BI. Need to consider more options (in-memory, sandbox)

    Damn it, I cannot keep my head straight trying to answer your question! :w00t:

    M.Sc.IT, M.B.A, MCTS BI 2008, MCITP BI 2008, MCTS SQL Dev, CSM, CDVDM

  • I agree with the prior posts that the Kimball books outline well the approach to creating a data warehouse from the ground up. Be careful with defining the "grain." Some department may only say that care about daily totals and not what hour in the day the action occurred. That can change later or that dimension can be shared with another process with a more granular requirement. The book covers this and other aspects such as bridging tables, Definitely worth the read.

    ----------------------------------------------------

  • Just start writing, anything, everything and maybe something will catch fire

    The problem is once you begin something there is this inertia that keeps you from wanting to go back and get things right. It is important to create those data flow charts and even pseudo-code. It makes the actual process of code writing faster and error free and more likely to be error free in the end.

    ----------------------------------------------------

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply