Separate ETL server

  • I have recently inherited responsibility for managing a small data warehousing team, using SQL Server 2008 R2. My predecessor, on the advice of our enterprise architecture team, has set up the environment to have separate servers for ETL and the warehouse DBMS. This was on the grounds that it is considered "best practice" to separate the application layer from the database layer, and also that it would result in "improved performance".

    Performance improvements are unlikely. Our ETL jobs all run overnight, when no other work is going on, and rarely take more than 1-2 hours. The Production database is actually more powerful than the ETL server, so if they were in a combined environment would probably run faster.

    In terms of separating the application layer... I completely understand this is considered good practice. But our warehouse is a very small and generally not considered business critical. Important, yes. But not business critical. DBA / system admin resources in the organisation are limited, so it seems like overkill to maintain a separate server just for ETL.

    To put things in perspective...

    We have no Dev/Test environment, just Prod. Development and testing work gets done directly in the production environment, and changes are frequently made on the fly. This is on the grounds that maintaining an additional environment would be too resource intensive. Understandably, our Production environment is a massive mess and it has become extremely difficult to distinguish work-in-progress from completed tasks.

    Personally I think I would prefer a separate Development and Production environment (both with ETL and database) rather than a single Production environment.

  • Is this just a rant (it's a good one!), or are you hoping for comments?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Sorry if it came off sounding a bit "ranty". That wasn't my intention... I was wanting some opinions on an optimal architecture in a constrained environment.

    My personal experience has tended to vary from two extremes -- either very small, single server environments (mostly doing pure development work) or very large enterprise environments running business critical reports. I'm not used to the "in between" situation. Personally I would have thought in terms of making compromises, separating application and database layers would have been less important than having separate development and production environments.

    I guess I was hoping for some practical experiences... what are some of the reasons for having a separate ETL server (beyond performance considerations)?

  • Tyberious Funk (2/10/2013)


    <snipped>

    We have no Dev/Test environment, just Prod. Development and testing work gets done directly in the production environment, and changes are frequently made on the fly. This is on the grounds that maintaining an additional environment would be too resource intensive. Understandably, our Production environment is a massive mess and it has become extremely difficult to distinguish work-in-progress from completed tasks.

    Personally I think I would prefer a separate Development and Production environment (both with ETL and database) rather than a single Production environment.

    OK, as I read your predecessor's opinion, having the separate ETL server is important to take load off a production server that is already all mucked up with development and testing work? Yikes!

    If the production server can handle the overnight ETL processes without hindering production processes and you can only afford two servers, I would absolutely push for separate development and production environments. The additional effort to manage a development environment (that in almost all respects will be configured exactly the same as the production environment, permissions being the most obvious exception I can think of) should be manageable, even with a small team. You say that the data warehouse is important but not business-critical, but I expect that the users of that data warehouse would be calling for your head if it is unavailable for two or three days while your team rebuilds it (you mentioned that your DBA resources are limited) because someone ran some ill-conceived, un-reviewed, and untested code that brought it crashing down in flames.

    Jason Wolfkill

  • I agree with the previous comment. Even if performance is degraded slightly (and it's not clear that it will be), the fact that you can maintain a separate dev environment is worth it on its own.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I prefer to have the ETL (SSIS) installed on a different server because it is quite common to need to install or update additional software, controls, DLLS, etc. that might need a reboot, or that might experience a problem that will require a reboot.

    Also, you might need to have the ETL (SSIS) on a different version of SQL Server than the main database server.

  • Michael Valentine Jones (2/11/2013)


    I prefer to have the ETL (SSIS) installed on a different server because it is quite common to need to install or update additional software, controls, DLLS, etc. that might need a reboot, or that might experience a problem that will require a reboot.

    Also, you might need to have the ETL (SSIS) on a different version of SQL Server than the main database server.

    This is an interesting point. At the moment we are running 2008 R2 across all the environments, but I'm wanting to investigate upgrading to 2012. Is it possible to run your ETL on 2008 R2 and your DBMS and Reporting Services environment on 2012? Well... I'm sure it is possible. But is it practical?

    Being able to update systems progressively would certainly make the process less painful (if we decide to go ahead).

  • Tyberious Funk (2/11/2013)


    Michael Valentine Jones (2/11/2013)


    I prefer to have the ETL (SSIS) installed on a different server because it is quite common to need to install or update additional software, controls, DLLS, etc. that might need a reboot, or that might experience a problem that will require a reboot.

    Also, you might need to have the ETL (SSIS) on a different version of SQL Server than the main database server.

    This is an interesting point. At the moment we are running 2008 R2 across all the environments, but I'm wanting to investigate upgrading to 2012. Is it possible to run your ETL on 2008 R2 and your DBMS and Reporting Services environment on 2012? Well... I'm sure it is possible. But is it practical?

    Being able to update systems progressively would certainly make the process less painful (if we decide to go ahead).

    I have setup a number of systems with ETL on dedicated servers, and they run fine with different versions of SQL Server on the ETL, source, and target servers.

  • I've seen it where many developers create ETLs and thus a dedicated server makes sure they are all stored in production in one location, regardless of the servers that are the actual subjects of the dataflow.. Of course you are putting all your eggs in one basket should something happen to the server and there is no failover.

    ----------------------------------------------------

  • Let's not forget that installing SSIS on its own server requires purchasing separate licenses for SQL Server on that server. If you are running SSIS on an ETL server and do not have the licenses required - it could end up costing you a lot more than you realize.

    For that reason alone, I normally setup a reporting system/data warehouse and include the ETL processes (SSIS) on that server. This leaves my production system without the requirement for additional components. Since the reporting/data warehouse server is just that - it is much easier to schedule a restart as needed.

    For a constrained environment like this - I would definitely recommend the above scenario. However, keep in mind that wherever you install SSIS you need to account for the additional memory it requires. How much you leave for SSIS will be determined by your ETL processes and how they are written and configured.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams 3188 (4/25/2013)


    Let's not forget that installing SSIS on its own server requires purchasing separate licenses for SQL Server on that server. If you are running SSIS on an ETL server and do not have the licenses required - it could end up costing you a lot more than you realize.

    We already have a license for our ETL server... which is what bemused me. The organisation I'm working for is relatively small and budget-conscious. It seemed like a luxury to have a dedicated ETL machine that pretty much sits idle for around 20-22 hours each day. Both in terms of under-utilised hardware and in terms of licencing costs.

  • Heya Ty,

    First, let me add another voice to lament the situation you find yourself in. I'm also a major proponent of a separate ETL server, although the size of your organization really doesn't warrant it... because the reasons it's important aren't important to you... yet.

    First and foremost, a dedicated ETL server allows for more effecient security and organization. It means that individual servers have their SQL agent/schedules/etc dedicated to simple maintenance to their hosted items. Index rebuilds, backups, etc. All ETL work and the like are also in a single place, so there's no hide and seek for packages that pull instead of push when you do ETL changes, etc.

    Next, you don't end up in memory wars with your existing applications when you're running large processed in mid-day against your critical systems.

    Thirdly, as mentioned, reboots and installs and the like on a separate ETL server allow for rebooting the SSIS components without requiring a critical system to be taken down because someone wants a new DDL library for some random off the wall process they're building.

    None of these apply to your environment. You're wasting a perfectly good and purchased license to... do practically nothing with it. You could have a dev environment on that weaker box! You could... well, a lot of things.

    I would seriously consider moving the SSIS/ETL components over to the warehouse server for now, until your organization actually grows large enough to be important. Use the old ETL server as a dev box, and get your developers used to playing in the sandbox without breaking production.

    That's... a mess. You need to decide if the politics are ripe to start fixing things, or if you want to duck and just collect your paycheck. Either way, I'd write up the email explaining the logic behind the changes and pass it along to those who matter, so at least you have said paperwork to cover your arse when someone tries to bring the hammer down on your head.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • We have a separate ETL server in our environment. It gives us the separation of layers as was stated earlier, but more importantly for us, it gives us one central server to manage all of the interfaces we having running in our environment rather than running them on either the source servers or the target servers. We have many nightly ETL jobs running on this server where only one of them is for data loads to our data warehouse. The remainder are for data integration/synchronization purposes.

  • I think the priority needs to be on separating testing and development from production, and you can worry about mxiing ETL into production at a later date. Just getting production cleaned up will be a large effort, no matter how small the organization or enterprise. It's amazing what the passage of time can accumulate. Just documenting what's already there is task one, and that will quite likely take far more time than you might be willing to believe. You may even find yourself with code for which there's no source in your .NET side of the house, or, make the revolting discovery that the deployed code is actually significantly different from what you thought was the documented source, and thus that said allgedly correct source, isn't correct; just after deploying an update that used the allegedly correct source, only to discover that you have no good backup of the previously deployed, properly working, code. That leads to task two: Don't change ANYTHING until EVERYTHING is documented, and most importantly, VERIFIED. It's always amazing just how many dependencies you discover AFTER something changes. At the very least, be 100% sure you can go back to the previous state of affairs before making changes.

    Hope you find it all.... and good luck with your efforts!

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Craig makes a good point above. In a small organisation you don't have as many developers and you are likely to be familiar with all ETL processing. There is little to no chance a developer (like in another department if a big company) can deploy his ETL somewhere and others not know about it, as can be the case in a larger company. You likely don't have that many servers to hide things in. Thus those larger business, that can also afford extra licenses, will want all developers' ETL's centrally located for easier administering such quality checks.

    ----------------------------------------------------

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply