Hadoop for SQL Developers

  • jayarora

    SSC Enthusiast

    Points: 112

    Hello People

    I am a newbie in the field of SQL Development and have a silly query. I am doubtful about a thing that if Hadoop is easy to learn for SQL Developers. Is it essential to be aware of Hadoop if I'm totally into the field of SQL Development? How much SQL is required to learn Hadoop? Please help with useful suggestions if anyone has any.

  • Andrew P

    SSCarpal Tunnel

    Points: 4645

    Hi jayarora,

    No, I wouldn't expect a junior SQL Developer to know Hadoop.

    You mention you're a newbie in the field of SQL development. If you're expected to write T-SQL in your job, or desired job, I recommend solidifying your T-SQL skills before you go and learn Hadoop or any of the other big data analysis solutions.

    If your SQL queries are taking too long to run, index tuning and query writing techniques would be a good solution, and those take time to learn. If you come across situations where your queries are still too slow, even with efficient T-SQL queries and indexes, and the server has sufficient resources, it would be wise to look at Hadoop then.

    For reference, I've been a SQL Developer for several years, and haven't yet learned Hadoop, as I haven't found it to be a solution to be the a necessary solution to any of the challenges I've faced. However, because I am no longer a junior SQL Developer, I should probably have an awareness of where Hadoop would fit best so I know to go learn that if a specific challenge required it.

    Andrew

  • Jeff Moden

    SSC Guru

    Points: 997205

    I'll second Andrew's take on this.  Focusing on T-SQL until you've really learned the licks is the right thing to do and it will help you quite a bit when you get to Hadoop.  IIRC, Hadoop uses "HQL", which has a lot of similarities to SQL in general.

    Just to give you some confidence, I've been doing pretty good with T-SQL and (knocks on wood) haven't yet needed to get into learning Hadoop.

    Shifting gears a bit, some of the best things you can do to learn T-SQL is to learn as many of the intrinsic functions as you can, especially what you can actually do with date and time functions (the examples in most straight-forward books only show what the function does and not some of the magic you can actually do with them).  Another very important set of functions are the "Windowing" functions, like ROW_NUMBER() and how to use OVER with many of the aggregate functions.

    Of course, you should also learn how to use JOINs and how to use the two forms of APPLY.  You should also learn how to create Inline Table Valued Functions (nasty fast compared to Scalar functions and a lot of us refer to them as "iTVFs").

    You should also learn how to count from 0 (zero) to some number as rapidly as possible because it's used to solve some of the hairiest problems there are.  A lot of us refer to such a thing as a "Tally Table" ("Tally" means "to count") or an equivalent function.  Programming classes for just about every language in the world teach how to count (in a loop) as one of the first and most important things there is but most SQL classes never get around to teaching how to count until they get to WHILE loops and Recursive CTEs, both of which are two of the absolute worst ways to count in SQL.

    See the following article for an introduction to what you can do with a "Tally Table" and then see the article and function at the list link in my signature line below for a replacement for the Tally table.  Understand that, behind the scenes, every SELECT is actually a near machine language speed loop.  You don't usually need to write any loops because SELECTs already loop.  I call them "Pseudo Cursors" (thanks to R. Barry Young for originating the term).

    https://www.sqlservercentral.com/articles/the-numbers-or-tally-table-what-it-is-and-how-it-replaces-a-loop-1

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • khatrivijay1975

    Newbie

    Points: 3

    Hadoop learning is always easy. It is not necessary for a SQL Developer to learn Hadoop Then whether you are a SQL developer, BI or Data Warehousing Professional, or a DB professional.

    Anyone can learn Hadoop. The required knowledge is knowing programming.

    For people with SQL skills, Hadoop programming is also easier-thanks to Pig and Hive. SQL developers are taking interest in learning Hadoop because Hadoop being the Opensource, Fault-Tolerant and Scalable framework, that stores huge amounts of data and process them efficiently. This facility is missing in RDBMS systems, therefore, it can be stated that Hadoop is an expansion of the RDBMS system in order to deal with Large Datasets.

    Also I am sharing some valuable resources that will help you to figure out the best skills and talents you should have in the field you are jumping and planning to make your career into

    https://www.quora.com/How-about-the-career-as-a-SQL-developer

    https://squareboat.com/blog/a-career-as-a-sql-developer

    https://www.oracle.com/database/technologies/appdev/sql-developer.html

    https://www.sqlservercentral.com/

    Since Hadoop has its unique features like flexibility, scalability and fault-tolerant mechanism, people who already have basic knowledge of SQL can learn Hadoop and start working on the framework through the Hive ecosystem project because Hive 's syntax and commands are exactly like SQL queries. Hive helps SQL experts to query data using a SQL like a syntax, making it an ideal Big Data tool to accomodate Hadoop and other BI tools. Apache Pig helps professionals in SQL servers to create parallel data workflows. Apache pig simplifies data manipulation with a combination of tools over multiple data sources.

    It would not make sense for professionals with SQL skills to write, debug, compile and execute a long java MapReduce code even if they only want to retrieve some rows from the basic Hadoop file.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply