The Toolchain for the Job?

  • Comments posted to this topic are about the item The Toolchain for the Job?

    Best wishes,
    Phil Factor

  • It's funny that you brought this up in an article just now, especially the part about ETL.

    Even as early as SQL Server 6.5, I built things in SQL Server that would handle everything about the ETL I was doing from downloading data from multiple different sources (FTP, "PoleCat" telephone buffers, telephone switches, modem logins, etc, etc), importing data from multiple different sources (reel-to-reel tapes [ATT] using a desktop reader, CSV/TSV/Fixed Field Format files, CDs, and even Floppies), ran it all through a "normalizing interface" I built, whack, stack and process the data to a billiable end result, and then transmit the files for billing, taxes, etc to various agents using the same techniques I used for downloading but, of course, in an upload fashion.

    It was ALL controlled by flow control I built in T-SQL and was object driven rather than serially driven. Of course, my old friends XP_CmdShell came into play when I needed to do something "outside the box" but it was all rather simple and straight forward and the processes all ran independently of each other and each process would simply make it's data available (by file or table) to the next process.

    Now, at work, we have this $350K (not to mention a huge monthly license and support fee) monster piece of garbage that only two people know use and it still relies on things like BULK INSERT and special FTP servers and managed code (freakin' EDI requirements) that we've spent hundreds of thousands of dollars on. The bad part is that I've demonstrated how simple some things can be done using SQL Server as a process controller of all that as well as simplifying the imports/exports and file management itself not to mention the huge performance increases that I get over all that stuff and what do I get as an argument against it all for the head honcho that could allow it all at a great cost savings?

    "SQL Server is not the center of the universe. Just because you can do something in SQL Server doesn't mean you should and I don't want to use it as an ETL platform".

    Can't we just keep things simple? Some of the old technology that you spoke of is still the best technology for doing all of this. It just doesn't need to be as complicated at people make it and it doesn't need expensive new bobbles to do any of it.

    Yes, I'm familiar with the tired old, passive-aggressive, ad hominem saying of "To a hammer, everything is a nail" but what the hell's wrong with using a hammer (or the built in nail-guns) when all you're trying to do is... drive nails. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Somewhere, there is a large corporation that curses and swears at a system I put in place twenty years ago that did an ETL job. It is still in use because they can't better it. They've had two expensive projects to do so but both have failed. I wrote it all in batch files, with a few bits of C++ where I couldn't get Batch files to do what I wanted. I could and should have written the little book of how to write batch files. The trouble is that the current IT department can't do batch files and they struggle with C++ so they try to do it all in C#. With batch files, You can do stuff in parallel easily and so the whole thing just tracks all the dependencies and then does topological sorts to work out all the jobs that can be done in parallel. Then it does them, six at once, longest first, until they're finished- and onto the next layer. So simple you could cry.

    Best wishes,
    Phil Factor

  • I currently have some batch jobs being called by Task Scheduler doing some simple ETL processes.

    Why?

    Because it's the easiest to deploy and maintain for the non-programmers in my department.

    Will have to convert them to PowerShell, which is going to be an educational nightmare.

  • Phil Factor (12/19/2016)


    Somewhere, there is a large corporation that curses and swears at a system I put in place twenty years ago that did an ETL job. It is still in use because they can't better it. They've had two expensive projects to do so but both have failed. I wrote it all in batch files, with a few bits of C++ where I couldn't get Batch files to do what I wanted. I could and should have written the little book of how to write batch files. The trouble is that the current IT department can't do batch files and they struggle with C++ so they try to do it all in C#. With batch files, You can do stuff in parallel easily and so the whole thing just tracks all the dependencies and then does topological sorts to work out all the jobs that can be done in parallel. Then it does them, six at once, longest first, until they're finished- and onto the next layer. [font="Arial Black"]So simple you could cry. [/font]

    Ah... you do have a way of poking the sweet spots for me. The bold text above is what I'm talking about.

    One of my other pet peeves is that many people think that if you don't change something, then you're failing to perform. Your example above is a perfect example of what I'm talking about. Why would they try to change a rock solid and simple process at all? I can understand it if there were some short-coming or maybe doing simple Proof-of-Concept experiments to see if different areas could be improved before sinking a shedload of money, talent, and time into wholesale replacement but I can't see leaning full-on towards replacing something without prior knowledge that it can actually be improved. There's enough stuff to work on that actually is broken, performance challenged, or difficult to maintain. Why do people insist that old is bad and needs to be replaced?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • chrisn-585491 (12/19/2016)


    I currently have some batch jobs being called by Task Scheduler doing some simple ETL processes.

    Why?

    Because it's the easiest to deploy and maintain for the non-programmers in my department.

    [font="Arial Black"]Will have to convert them to PowerShell[/font], which is going to be an educational nightmare.

    Why do they need to be converted to PowerShell?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Phil Factor (12/19/2016)


    ...The trouble is that the current IT department can't do batch files and they struggle with C++ so they try to do it all in C#. With batch files...

    Then they are not proficient enough in C#.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • I'm a bit of a handy-man both around the house and the database server at work. Just like duct tape, I consider PowerShell to be an essential tool for a wide variety of one-off tasks and component integration.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Been having a number of those debates lately. To keep scripts elsewhere independent or to create a centralized process that controls those scripts or to use SQL Server to control those scripts etc.

    All have their pros and cons. I personally don't like using SQL Server to control everything. It has enough on it's plate and is expensive enough to not easily scale if the food starts to come off that plate.

    These days, you do have Python, which is objective driven, which allows you to do all three. I particularly like the idea of Python with it's queuing systems that allow you to create distributed processing with worker nodes across 2 or more machines. This has a huge win in the toolchain over things like Powershell, because all scripts are working together versus independently controlled by a centralized brain like SQL Server.

  • Jeff Moden (12/18/2016)


    ...Now, at work, we have this $350K (not to mention a huge monthly license and support fee) monster piece of garbage that only two people know use and it still relies on things like BULK INSERT and special FTP servers and managed code (freakin' EDI requirements) that we've spent hundreds of thousands of dollars on...

    Ouch, this paragraph brings back way too many nightmares for me, expensive middleware, EDI, specialty systems only a select few know. I worked at a company that was a computer system distributor that used EDI for orders from a couple of their hardware vendors. The only thing more broken than the EDI process was when another vendor wanted to send us a daily multi-gigabyte XML file for their product catalog using an HTTP Post, which of course would fail more often than not.:ermm:

  • Jeff Moden (12/18/2016)


    It's funny that you brought this up in an article just now, especially the part about ETL.

    Even as early as SQL Server 6.5, I built things in SQL Server that would handle everything about the ETL I was doing from downloading data from multiple different sources (FTP, "PoleCat" telephone buffers, telephone switches, modem logins, etc, etc), importing data from multiple different sources (reel-to-reel tapes [ATT] using a desktop reader, CSV/TSV/Fixed Field Format files, CDs, and even Floppies), ran it all through a "normalizing interface" I built, whack, stack and process the data to a billiable end result, and then transmit the files for billing, taxes, etc to various agents using the same techniques I used for downloading but, of course, in an upload fashion.

    It was ALL controlled by flow control I built in T-SQL and was object driven rather than serially driven. Of course, my old friends XP_CmdShell came into play when I needed to do something "outside the box" but it was all rather simple and straight forward and the processes all ran independently of each other and each process would simply make it's data available (by file or table) to the next process.

    Now, at work, we have this $350K (not to mention a huge monthly license and support fee) monster piece of garbage that only two people know use and it still relies on things like BULK INSERT and special FTP servers and managed code (freakin' EDI requirements) that we've spent hundreds of thousands of dollars on. The bad part is that I've demonstrated how simple some things can be done using SQL Server as a process controller of all that as well as simplifying the imports/exports and file management itself not to mention the huge performance increases that I get over all that stuff and what do I get as an argument against it all for the head honcho that could allow it all at a great cost savings?

    "SQL Server is not the center of the universe. Just because you can do something in SQL Server doesn't mean you should and I don't want to use it as an ETL platform".

    Can't we just keep things simple? Some of the old technology that you spoke of is still the best technology for doing all of this. It just doesn't need to be as complicated at people make it and it doesn't need expensive new bobbles to do any of it.

    Yes, I'm familiar with the tired old, passive-aggressive, ad hominem saying of "To a hammer, everything is a nail" but what the hell's wrong with using a hammer (or the built in nail-guns) when all you're trying to do is... drive nails. 😉

    I had a friend & colleague who was accused of being a one trick pony.

    His response was "Yes, but it's a bloody good trick"

  • Regarding PoSh...I've come to have a healthy respect for this devil magic. It really has extended my solution options. I have also come up against what you described as cultural resistance. I'm the only one on my DBA team who takes it seriously but I'm patient, it took me a long time come around to it. Not sure I would want to build a DevOps solution around it but I wouldn't be surprised if someone tries.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply