Data Lake On-Premises and Distributed Computing

  • Brahmanand Shukla

    Right there with Babe

    Points: 714

    Recently, our senior management went to some event and attended seminar on Data Lake. When they came back they are very exited to implement it due to following reasons :

    1. To explore the open source technologies and parallel computing. We are planning to explore Apache Hadoop and related tools such as Hive, Sqoop, Ambari and ZooKeeper for this. We shall also explore Apache Spark.
    2. Minimize the time taken by our batch activities up-to 100%.  Currently we use SQL Server 2016 with SSIS and processes sometimes run up-to 5 hours which we want to bring down to 10 minutes. I know that's a very aggressive expectation but it seems possible. We want to do it with the help of Parallel Computing offered by Hadoop cluster.
    3. We have too many SQL Server Production instances and it keeps on growing. We wish to save the licensing cost of SQL Server by replacing the SQL Server instances which are not referred by any application and used only for batch processing.
    4. We want to combine the multiple batch processes which runs across multiple servers and SQL Server instances and wish to have a data lake. This can be further referred for reporting and data analytics purpose.
    5. We wanted to have the common storage for all the raw data and processed data to have better control.

    Few important fact that is worth sharing :

    1. We do not want to use Cloud due to regulatory as well as Data Security constraints.
    2. None of the team members assigned for the POC are experienced in Hadoop. All of us are SQL Server guys with programming experience of other technologies such as .Net, C#, VB etc. But the Leader initiated the project is experienced in Hadoop.

    This is very interesting topic but somehow I'm not experienced with either data lake or Hadoop so my fingers are crossed. Any feedback on whether we are on the right direction ? If yes, then what would be the correct approach to do it? would be really appreciated.

  • Site Owners

    SSC Guru

    Points: 80378

    Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • Brahmanand Shukla

    Right there with Babe

    Points: 714

    Any suggestions would be much appreciated !

  • Brahmanand Shukla

    Right there with Babe

    Points: 714

    I'm hoping to have some useful suggestion on this topic.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply