ETL Standards

  • Hi Guys,

    I've been tasked with creating an ETL framework document that contains a set of standards that our company will have to follow. My experience in SSIS isn't huge so I wanted to get an idea of what people thought should be in that document.

    Things like naming conventions are an obvious one but any other suggestions would be greatly recieved.

    Thanks

  • Just a couple of ideas:

    For naming conventions I go with the suggestions in Jamie Thomson's blog: http://consultingblogs.emc.com/jamiethomson/archive/2006/01/05/SSIS_3A00_-Suggested-Best-Practices-and-naming-conventions.aspx

    You might also want to have some type of standards for auditing/logging fields on your DW tables (e.g. LastUpdatedDate, CreateDate, ETLUpdateProcess, ETLCreateProcess, ...)

    Take a look at some of these SSIS best practices blogs: http://bi-polar23.blogspot.com/2007/11/ssis-best-practices-part-1.html

    HTH,

    Rob

    Edit: Sorry, I misread and thought you were trying to define a standards document rather than create a Framework. I'd recommend taking a look at Andy Leonard's blog: http://sqlblog.com/blogs/andy_leonard/default.aspx for his ETL framework example.

  • Thanks for your sugestions Rob.

    It's a mix between a framework and standards document but I'll take a look at the links you suggested and see how I go.

  • For SSIS, there's a few things I recommend. Most of them revolve around Configurations.

    I know why standards exist and mostly that's to keep some nut with his own idea of what's best because he hates the letters x, q, and B start doing random stuff. In general I find them annoying when you start getting to that level. However, there are technique standards I personally like to see utilized.

    First is configurations. Everyone needs to use the same method. If your house decides on XML docs, then everything uses XML Docs. If your house wants a SQL table, then everyone uses SQL Tables.

    Next, do you force each package to have a unique configuration or can they share configurations? There's pros and cons to both.

    Error controls: When used, are there particular expectations for them? Do they still need to email a team member when an error pathing is hit even though it allows a process to complete? etc.

    Script work always leads to an interesting discussion, as it starts with 'do we use VB.NET or C#'? I personally can't code a lick in C# right now, I use pure VB.NET. I can read it, sure, but I'd have to get up to speed on it. From there it comes down to things like do you always use the script object for file manipulation (my preference, it's less fussy) or do you use the file system control built into SSIS?

    Finally you need deployment standards. These usually include environmental system variables to switch which configuration location that a particular environment looks at. Dev points to different places than QA, for example, but you need a local static variable to inform the package where to get its configs so you can do just that. Do you create job scripts for the DBAs or do they do them for you? What are the security expectations on packages?

    You can go pretty deep if you want to get into the nitty gritty, but I'd start with the high level process kind of standards first, then extrapolate more specific standards if you find that Bob over on Team Bubbles REALLY hates Bs....


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks Craig that's really helpful.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply