SSIS Capacity Planning: Questions to ask

  • Hello,

    Newbie here. I am tasked with the hardware specifications for a SSIS server cluster. I'm not sure what questions I need to ask my client to determine the server capacity needed. Here is what I do know:

    Concurrent users (estimated): 5650(yr1) - 22,600(yr3)

    Total packages (estimated): 1000(yr5)

    By the way, this is for a SharePoint deployment. What other questions do I need to ask? Needless to say I know nothing of SQL SSIS so any capacity planning information/recommendations for a deployment of this size is helpful.

    Thanks in advance! 🙂

    Troy

  • Are you in the right forum? Are you implementing a new cluster on SQL 2005? Or 2008 R2? Or 2012?

    Also, "SSIS Cluster" is kind of a misplaced term as SSIS is not a cluster-aware service, and does not support failover from one cluster node to another.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Indeed, I have post this to the wrong forum. My sincere apologies! This is what happens when you're doing too many things at once.

    Troy

  • It's fine, as long as we know what it is you're trying to do...so what is it that you're trying to do? What version of SQL Server? And is SSIS running in a cluster a real need?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (3/26/2012)


    It's fine, as long as we know what it is you're trying to do...so what is it that you're trying to do? What version of SQL Server? And is SSIS running in a cluster a real need?

    This will be a part of a SharePoint environment with web proxies, web front ends, SQL Servers, dedicated SSRS servers, and today I was told of a need for a dedicated SSIS server. I'm not certain that its needed as in the previous design SSIS was incorporated in the primary SQL Server (2008 btw). The clustering is because the client was high availability and redundancy. This SharePoint system will be customer facing as well as internally used so up-time is a big issue.

    For the SSIS server I'm told the following:

    Total Concurrent Users (estimated): 5450

    Total packages for SSIS(estimated): 1000

    Approximate size of data file to be processed: 18MB each

    Approximate max amount of data to be processed: 900MB every 2 weeks

    Planned process to take place after hours (6pm-6am)

    Our standard server build is a dual processor(6 core each) w/ 16GB RAM and 146GB internal storage, though there is a 1.5TB SAN being used for data storage. Is this overkill for this process? what is the "rule of thumb" on this? Any advise is greatly appreciated. by the way, the server will be clustered for redundancy.

    Troy

  • As far as scoping your hardware, those user counts are a little out of my league. Will that be the users of SharePoint, or the number sof users potentially directly kicking off SSIS packages? I would look for a White Paper to help you estimate based on all factors, including data volumes and workload-type. At the end of the day it may largely depend on the workload-type, i.e. will you be leveraging a lot of what SSIS can do in terms of doing data transformations in memory? or will you only be using SSIS to bulk-load files into a database and then call a stored procedure to do "the real data processing work"? or will you be using it as a passive database-to-database replication mechanism? or...?

    Re: the high availability requirement, I would not choose to use a Cluster to make SSIS redundant. It may actually make things more complicated and more expensive.

    Edit: As I am writing this I think this may be too much for a simple forum thread. You may want to look at paying an expert to work with you and guide you through this process.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks for the reply.

    These numbers I admit are vague, but this is what I have to work with. My understanding is that SSIS will be used for both data transformations in memory and bulk data loads and call stored procedures. I believe the concurrent users number is more for SharePoint than for SSIS, however the data sizes pertain to SSIS in hte amount of data to be processed. Again, these processes will run after hours on a schedule.

    If you do not recommend clustering, then generally speaking what would you do? I"ll search for a white paper to provide more info. I do have the option for aVM solution for this. Perhaps its better to do an "active/passive" server solution?

    Thanks again for your input!

    Troy

  • troy_hilton (3/27/2012)


    Thanks for the reply.

    These numbers I admit are vague, but this is what I have to work with. My understanding is that SSIS will be used for both data transformations in memory and bulk data loads and call stored procedures. I believe the concurrent users number is more for SharePoint than for SSIS, however the data sizes pertain to SSIS in hte amount of data to be processed. Again, these processes will run after hours on a schedule.

    If you do not recommend clustering, then generally speaking what would you do? I"ll search for a white paper to provide more info. I do have the option for aVM solution for this. Perhaps its better to do an "active/passive" server solution?

    Thanks again for your input!

    Troy

    A poor-mans version of an "active/passive" server solution as you put it was what I was thinking initially, two independent servers, one active and one warm standby or two actives with a split workload, with an external NAS or SAN location to park the SSIS packages themselves, Package Configuration files and the drop point for any incoming and outgoing data files. I would try to set things up so that the SSIS app servers would not be referenced by any other applications, allowing you to add more app servers later to run problematic or resource-intensive SSIS packages in isolation as needed, also allowing you to scale out with no worries of dependencies. Avoiding dependencies also allows you to bring any passive servers online in case of the primary failing with no external changes. You would just need to keep configurations at the ready in terms of SQL Server patch levels, security, jobs, schedules, etc. so when you're ready to bring the passive online you can run your scripts and be online again in a hurry. Some of those problems you would have with a cluster as well (patches and lack of SSIS cluster-awareness), some you would not including security and jobs (SQL Agent is cluster-aware). A cluster has other baggage though and you would not be able to get to a place where you had automatic cluster failover for SSIS without some custom work, or falling back to a manual failover plan of course, so I guess you need to weigh each and pick.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Something occurred to me while coooking breakfast...do you need to have the SQL Server Integration Services service running? It is not technically required to run SSIS packages, and in fact, I never have it running in my environments. I have to install SSIS during SQL Server setup to get the binaries necessary to run file-based packages from the command line using DTExec.exe, but since I do not use any of the management features I can have the service disabled and save the memory it would normally consume. If you find out you will not require the service to be running, then you have a more clear path to achieve automatic failover with clustering.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Interesting! I didn't know that was possible. I don't know but I'll find out.

  • OK let's back up here. Typically there are not concurrent users on an SSIS package as packages are importing/exporting/data mining and other wokflow tasks at a non-async time in relation to any transactions. Also, I have seen large numbers of packages but 1000 seems like they went nuts embbeding application logic into packages versus building a proper application.

    There is no clustering of the SSIS packages at all that is purely the SQL Server itself. You will execute the packages from one server or another. And yes the SSIS service is for SSIS storage and to a small extent execution, it does not have to be running for DTEXEC to work, you still have to install though to get the components as I recall.

    When it comes to sizing needs for SSIS consider the data volume being moved/altered, and the average time it takes to run the package(s). Really should not be something to outclass your SQL Servers specs alone. But I need to understand the workflows to know if there is anything more you might want to consider. 1 thing for sure is where the package is executed from. If you can you want local to any files being processed for performance.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply