Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

Talend vs. SSIS: A Simple Performance Comparison Expand / Collapse
Author
Message
Posted Monday, October 21, 2013 9:34 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, November 18, 2014 11:53 AM
Points: 27, Visits: 146
Comments posted to this topic are about the item Talend vs. SSIS: A Simple Performance Comparison
Post #1506955
Posted Tuesday, October 22, 2013 12:42 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, September 3, 2014 6:20 AM
Points: 6, Visits: 44
> "I would be especially interested to know how Microsoft shops are using the product and how it helps their respective organizations."

We use SSIS as a main data-document interchange platform for our retails software solution. SSIS is a solid brick-building simple to use fast and reliable tool. When integration, we concentrate on substance, not form (it's there, ready to use) and this brings us competitive edge in development speed.
Post #1506974
Posted Tuesday, October 22, 2013 1:23 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Yesterday @ 11:55 PM
Points: 2, Visits: 86
I have worked with both SSIS and Talend Open Studio. One area where Talend speeds up development I would think is while building Fact table packages.

In SSIS one would normally use the lookup transformation and you would get a split in your work stream for rows that succeed and rows that fail. For rows that fail you might want to provide some default values (unknown members, -1 etc.) using an Expression Transformation and then join both the streams back together using a Union transformation. Now in a fact package where you are looking up anything more than 15 or 20 dimension tables and the package size has exceeded 5MB - the entire package becomes unwieldy and a bit clunky i.e. takes a long time to open and close and make changes and save. This is because I think after every union - the variables flowing in are referenced as variables of the union transformation. Thus after every union transformation, this keeps on adding to the size. Also, from a source transformation, once can join to only one lookup at a time.

In Talend Open Studio, there is the versatile T-Map transformation. How I wish SSIS had something like this! It is very versatile and in a single T-MAP transformation, you can pull in all the tables, the starting tables as well as the ones you want to lookup and then specify the joins and also provide default values. This makes development easier and also the job is still easy to handle.
Post #1506990
Posted Tuesday, October 22, 2013 1:24 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, September 15, 2014 7:07 AM
Points: 2, Visits: 26
I think it's worth pointing out that there's a bit more to it all than this and any ETL/BI developers would be wise to have Talend's portfolio of products (open source or Commercial open source) on their radar. In particular there are a couple of interesting distinctions - Talend, like others in similar spaces (check out Bonitasoft if you are interested in BPM) use the Eclipse IDE as the basis for teh interface, hence there is a very easy learning curve when moving between systems and extensibility comes as a given. Second thing and this is a love it or hate it one touched on in the article: Talend OS is a java code generator (they do also have a perl engine but it lacks of few of the transformations as standard) so any workflow you construct actually generates java code - code which you can (unless you loathe java) inspect, tweak, enhance, debug etc as well as which you can of course compile into a standalone executable that can be deployed as a simple jar file to any system with a JRE, no other ETL elements are required nor indeed need there be any footprint at all of the Talend product on the runtime system. Peer support is fairly good and the product has a neatness and transparency about it which makes it very accessible (as well as being free for the Open Studio variety, natch).

Mark



Post #1506991
Posted Tuesday, October 22, 2013 2:59 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, October 22, 2013 6:36 AM
Points: 7, Visits: 26
Many kudos to the author for:
a) posting the row size
b) posting the environment used
c) running equivalent tests
d) running the tests several times in a row
e) choosing a large enough data set to override any potential caching that may have happened.

Very thorough test. Interesting results, and I can see how these would vary. What I would be interested in next, is to see these tests run on Windows64 with SQLServer 64 bit, where SSIS 64 bit would be tested against JVM 64 bit. I'm assuming these tests were run in 32 bit. Java seems to react better in a 64 bit environment.

On another note: while SSIS is interesting, it does have it's pains (as the author pointed out) with metadata and changes. I would recommend possibly checking out a tool called mapping manager from http://AnalytixDS.com which generates SSIS code (they are working on generating Talend Code) from a single set of metadata (cross-references/cross-walks). Anyhow, just a thought.

Thanks for the insightful read.

Hope this helps,
Dan Linstedt
http://LearnDataVault.com
Post #1507018
Posted Tuesday, October 22, 2013 3:25 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, November 12, 2014 12:07 PM
Points: 120, Visits: 489
Thanks Jeff for the Article,

I found it very interesting and also have some fun at reading it.

I think all depends of the scope of a project and in the business strategy and philosophy of the company. There are a lot of good ETL tools in the market, some of them are open source, and like most of the open source products they have a steep learning curve.

I have a nice experience with the Pentaho Data Integration (aka Kettle) and we could deliver good solutions to small size companies, who wanted to make use of their data and have a small budget. The disadvantage was that we paid the effort to learn the tools (time and money).
Right now I am involved in several projects, all of them with the complete Microsoft BI suite, and I can say that these tools make my life easier. The aspect that I like the most is the integration between the tools. For instance, how easy is to schedule and build execution logics for SSIS Packages using SQL Server Agent Jobs.


But: "All magic comes with a price" (Rumplestiltskin)


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1507028
Posted Tuesday, October 22, 2013 4:01 AM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, October 24, 2013 12:12 PM
Points: 72, Visits: 326
We ran similar performance tests, in this case between Pentaho (PDI ,also Java based) and SSIS. With SQL server SSIS is faster (about 10%), perhaps due to native OLE-DB connectivity. However, I very gladly lose some speed when developing ETL dataflows/transformations is so much quicker with PDI. Pentaho offers more functionality, especially for non-relational db sources. It is very stable without the SSIS glitches, with a limited set of internal datatypes (instead of the annoying number in SSIS) and parsing is immensely superior.
Post #1507039
Posted Tuesday, October 22, 2013 5:10 AM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, May 15, 2014 5:28 AM
Points: 1, Visits: 34
Another vote for Pentaho... Whilst it isn't as polished as SSIS and lacks the nice SQL Server integration features (i.e. versioning, environment variables, reporting), it shines for non-SQL backends (DB2, Progress) where SSIS struggles with native datatypes.
Post #1507058
Posted Tuesday, October 22, 2013 5:27 AM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, October 24, 2013 12:12 PM
Points: 72, Visits: 326
vidalst (10/22/2013)
Another vote for Pentaho... Whilst it isn't as polished as SSIS and lacks the nice SQL Server integration features (i.e. versioning, environment variables, reporting), it shines for non-SQL backends (DB2, Progress) where SSIS struggles with native datatypes.


Sorry, but Pentaho is a MUCH more polished tool. It supports both variables and parameters and both reports/ dashboards can be defined with the BI server. Versioning is a matter of using the right open source version control tool (like Git) Reusing a pilot against SQL server took a few minutes to get a workable version against Oracle. With SSIS this was really a painful exercise. You need the enterprise version to get ANY performance against any other DB than Sql Server
Post #1507067
Posted Tuesday, October 22, 2013 6:16 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, January 13, 2014 11:14 PM
Points: 1, Visits: 23
Hi Jeff,

Really a good article with detailed explanation and analysis.


Nitin
Post #1507087
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse