Technical Article

Data Mining Part 32:The Microsoft Data Mining Enemies

,

Introduction

The Chinese general and strategist Sun Tzu used to say: 

"Keep your friends close, but your enemies closer".

It is a good practice to take a look to the competitors to verify that we are using the best tool or what else is offering the market. In this new article, we will take a look to the Data Mining enemies to have a closer look about what are they doing right now. 

Requirement

This time if you do not have anything installed it is OK. This chapter is not technical and can be read by non-technical readers (leave the comments to complain if you do not agree).

Getting Started

There are more than 3000 data mining tools in the market. That is why we will talk about some of the most popular ones. The positions depends and each magazine show different positions. If a very popular data mining tool is not in the top ten list, it is because the positions change every second because of the impressive number of tools.

1. RapidMiner

According to several articles, this is the most popular tool in the world right now.  It used to be open source and free, but now is a commercial software. The starter edition is free and very limited. It is a very popular tool that allows to create faster prototypes and a workflow based project.

RapidMiner is a German tool started in the University of Dortmund. It is a multiplatform tool and can be integrated with WEKA (see the tool number 5 for more information).

2. R

R is a very popular programming language that is growing very fast and used for statistics and data mining. You can call the R objects with other programs like C#, Java, Python, C++ and others. R is not user friendly at all, but it is very popular. It is for developers and people who loves math and statistics.

This software is multiplatform and it was created at the University Of Auckland, New Zealand.

3. SAS (Statistical Analysis System)

This is one of the oldest software programs related to data mining and statistics. It is still popular over the time. This software is easy to learn and can be integrated with R. It is user friendly and the most complete tool for data mining.

The main disadvantage is the price. It may cost several thousands of dollars depending of the number of modules bought (and they are a lot).

4. Python

Python is a programming language that includes some data mining modules. Python is a language easy to read, but not so easy to learn. At least not for people without previous programming knowledge. There are several modules available. You can create very sophisticated graphs with Python.

5. WEKA (Waikato Environment for Knowledge Analysis)

This is software created in the University of Waikato (New Zealand). It is written in Java, and it is user friendly.

In the year 2006, it was the Data Mining Pentaho solution. It can be integrated with a SQL Server Database. It can be obtained for free, and it is multiplatform.

6. Orange

This is a module for Python and also a program (written in Python). It is a visual and intuitive program written and maintained by the University of Ljubljana of Slovene. Orange is an intuitive multiplatform software based in workflows.

It is specialized in Bioinformatics because the department of Bioinformatics of the University maintains part of the modules. It also includes text Mining and data fusion.

7. Oracle

Oracle includes a feature named Oracle Data Mining (ODM). To be honest Oracle is not very popular in data mining, but it may increase the number of customers in the future. Oracle, as always, is like Lex Luthor.  it is always the competitor to the Superman, SQL Server, in different areas including data mining. 

Oracle is different than other solutions because you can apply data mining to the Relational Database directly while other solutions require a special structure. With Oracle you can have Data Mining results from tables, views or unstructured data.

The ODM is part of the Oracle Advanced Analytics, which allows an easy integration with R, which is one of the most popular Data Mining Tools right now.

8. Angoss

This is a Canadian software from Toronto. The tools can be integrated with SQL, SAP and R. It is very flexible and focused in reduce the time to prepare the data. KnowledgeSTUDIO is the main tool for business analysis, but the number of tools is very complete. There are at least 8 software programs related to data mining. The charts are very advanced and you can have 3D charts and you can export the charts to Microsoft Office.

9. IBM SPSS

This is a very popular and very complete software. They started on 1969. It is a very old software with an easy interface. Is one of the most popular statistical applications. It has a section for Data Mining which is also popular. 

 The IBM SPSS Modeler is used for Data Mining and Text Analysis. It can be installed in Windows, Linux and UNIX.

10. KNIME

This is a German software made in Eclipse, and it can be installed in any machine that supports Java. This software is visual, intuitive and easy to use. The program was made in Java and it is possible to extend in Python, WEKA and Perl.

Conclusions

As you can see, the competition is wide. Most of the tools can be customized in workflows and the output results can be very sophisticated charts. Most of the competitors offer integrations with other third party tools.

Microsoft has just one module of SSAS dedicated to Data Mining while other programs have several exclusive applications related to Data Mining.

There are a lot of enemies and it is hard to compete when most of the enemies support multiplatforms which is an advantage for many companies. However, besides that Microsoft Data Mining have many enemies, it is still in competition. Most of the surveys consider that Microsoft Data Mining is in the top ten list.

References

Rate

Share

Share

Rate