Talend vs. SSIS: A Simple Performance Comparison

  • Comments posted to this topic are about the item Talend vs. SSIS: A Simple Performance Comparison

  • > "I would be especially interested to know how Microsoft shops are using the product and how it helps their respective organizations."

    We use SSIS as a main data-document interchange platform for our retails software solution. SSIS is a solid brick-building simple to use fast and reliable tool. When integration, we concentrate on substance, not form (it's there, ready to use) and this brings us competitive edge in development speed.

  • I have worked with both SSIS and Talend Open Studio. One area where Talend speeds up development I would think is while building Fact table packages.

    In SSIS one would normally use the lookup transformation and you would get a split in your work stream for rows that succeed and rows that fail. For rows that fail you might want to provide some default values (unknown members, -1 etc.) using an Expression Transformation and then join both the streams back together using a Union transformation. Now in a fact package where you are looking up anything more than 15 or 20 dimension tables and the package size has exceeded 5MB - the entire package becomes unwieldy and a bit clunky i.e. takes a long time to open and close and make changes and save. This is because I think after every union - the variables flowing in are referenced as variables of the union transformation. Thus after every union transformation, this keeps on adding to the size. Also, from a source transformation, once can join to only one lookup at a time.

    In Talend Open Studio, there is the versatile T-Map transformation. How I wish SSIS had something like this! It is very versatile and in a single T-MAP transformation, you can pull in all the tables, the starting tables as well as the ones you want to lookup and then specify the joins and also provide default values. This makes development easier and also the job is still easy to handle.

  • I think it's worth pointing out that there's a bit more to it all than this and any ETL/BI developers would be wise to have Talend's portfolio of products (open source or Commercial open source) on their radar. In particular there are a couple of interesting distinctions - Talend, like others in similar spaces (check out Bonitasoft if you are interested in BPM) use the Eclipse IDE as the basis for teh interface, hence there is a very easy learning curve when moving between systems and extensibility comes as a given. Second thing and this is a love it or hate it one touched on in the article: Talend OS is a java code generator (they do also have a perl engine but it lacks of few of the transformations as standard) so any workflow you construct actually generates java code - code which you can (unless you loathe java) inspect, tweak, enhance, debug etc as well as which you can of course compile into a standalone executable that can be deployed as a simple jar file to any system with a JRE, no other ETL elements are required nor indeed need there be any footprint at all of the Talend product on the runtime system. Peer support is fairly good and the product has a neatness and transparency about it which makes it very accessible (as well as being free for the Open Studio variety, natch).

    Mark

  • Many kudos to the author for:

    a) posting the row size

    b) posting the environment used

    c) running equivalent tests

    d) running the tests several times in a row

    e) choosing a large enough data set to override any potential caching that may have happened.

    Very thorough test. Interesting results, and I can see how these would vary. What I would be interested in next, is to see these tests run on Windows64 with SQLServer 64 bit, where SSIS 64 bit would be tested against JVM 64 bit. I'm assuming these tests were run in 32 bit. Java seems to react better in a 64 bit environment.

    On another note: while SSIS is interesting, it does have it's pains (as the author pointed out) with metadata and changes. I would recommend possibly checking out a tool called mapping manager from http://AnalytixDS.com which generates SSIS code (they are working on generating Talend Code) from a single set of metadata (cross-references/cross-walks). Anyhow, just a thought.

    Thanks for the insightful read.

    Hope this helps,

    Dan Linstedt

    http://LearnDataVault.com

  • Thanks Jeff for the Article,

    I found it very interesting and also have some fun at reading it.

    I think all depends of the scope of a project and in the business strategy and philosophy of the company. There are a lot of good ETL tools in the market, some of them are open source, and like most of the open source products they have a steep learning curve.

    I have a nice experience with the Pentaho Data Integration (aka Kettle) and we could deliver good solutions to small size companies, who wanted to make use of their data and have a small budget. The disadvantage was that we paid the effort to learn the tools (time and money).

    Right now I am involved in several projects, all of them with the complete Microsoft BI suite, and I can say that these tools make my life easier. The aspect that I like the most is the integration between the tools. For instance, how easy is to schedule and build execution logics for SSIS Packages using SQL Server Agent Jobs.

    But: "All magic comes with a price" (Rumplestiltskin)

    Paul Hernández
  • We ran similar performance tests, in this case between Pentaho (PDI ,also Java based) and SSIS. With SQL server SSIS is faster (about 10%), perhaps due to native OLE-DB connectivity. However, I very gladly lose some speed when developing ETL dataflows/transformations is so much quicker with PDI. Pentaho offers more functionality, especially for non-relational db sources. It is very stable without the SSIS glitches, with a limited set of internal datatypes (instead of the annoying number in SSIS) and parsing is immensely superior.

  • Another vote for Pentaho... Whilst it isn't as polished as SSIS and lacks the nice SQL Server integration features (i.e. versioning, environment variables, reporting), it shines for non-SQL backends (DB2, Progress) where SSIS struggles with native datatypes.

  • vidalst (10/22/2013)


    Another vote for Pentaho... Whilst it isn't as polished as SSIS and lacks the nice SQL Server integration features (i.e. versioning, environment variables, reporting), it shines for non-SQL backends (DB2, Progress) where SSIS struggles with native datatypes.

    Sorry, but Pentaho is a MUCH more polished tool. It supports both variables and parameters and both reports/ dashboards can be defined with the BI server. Versioning is a matter of using the right open source version control tool (like Git) Reusing a pilot against SQL server took a few minutes to get a workable version against Oracle. With SSIS this was really a painful exercise. You need the enterprise version to get ANY performance against any other DB than Sql Server

  • Hi Jeff,

    Really a good article with detailed explanation and analysis.

    Nitin

  • We use both SSIS and Talend. Typically we use SSIS when writing to SQL Server and Talend for everything else. On your performance measurement, you might want to create stand-alone jobs (run from command line) and run them for comparison. Talend jobs run significantly faster on our server after exporting and scheduling to be run in batch instead of running within the interactive development/debugging environment.

    Thanks for the article.

  • We have just started using Talend for some Oracle data quality projects and I can't even get it to run. It definately requires some training.

  • If you mean the DQ offering itself then that is a diff beast altogether (and, when I last looked at it, not overly impressive) - anybody with basic ETL skills will pick up OpenStudio more or less immediately ...

  • Just wondering, why in Talend job a tFileOutputMSDelimited (multi - schema) is used instead of a simple tFileOutputDelimited (single - schema). It wouldn't most probably change a lot, but well, it would be more transparent.

  • My primary ETL background is SSIS and DTS before that. So I'm a big fan of using it to shunt complex data sets around, including dreaded VB.NET transformations.

    I've had cases when trying to get data from DB2 and other non-MS sources in which I just couldn't get SSIS to work, so having played with Talend decided to try it in anger.

    It opened the DB2 system straight up, was able to reference tables and data and when told to fire them into a SQL system (to make analysis easier & faster for me), offered to create SQL compliant tables on the fly!

    It's a clunky interface, but once adjusted, it's very powerful. I've recommended it for other non-MS scenarios as a powerful tool. The range of native connectors really does it for me, something that for once SSIS needs to catch up with.

    SSIS really needs native DB2 and SAP application connectors.

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply