Extreme SQL

  • Comments posted to this topic are about the item Extreme SQL

  • These extreme systems are so extremly fascinating to me.

    The company im working at is still working with Cubes. We have Dimensions with sometimes like around a million entries and alot MDX. As a result some of the reports take some time to calculate. But these numbers are kinda rookie numbers compared to these extreme systems with billions of rows. Even though they use (i assume?) in memory, i am still impressed and curious at how instantaneously you get results in something like google, youtube, facebook etc.


  • They just rely on the power of more and more nodes to distribute the workload of course. Taking that larger problem and breaking it up into a bunch of smaller problems. At least that's my assumption and how I technically handle the billions and trillions of records I have to query with my datasets.

    But there is the time element too. If you're using things like PolyBase, it's extremely slow compared to just taking a little bit of time and querying data from PolyBase into a table that you can hash/optimize. For me, I have a little bit of time to spare to ensure analytical queries can be queried from an internal table versus having no time and trying to query directly from an external table (or HIVE-like table) sitting on a data store.

    Even at billions of rows to query, which is lower than Facebook etc, Azure, Google, and AWS have great options that are surely within 10second result times for as long as you know how to query data right. I think the only time I've ever ran into issues doing ad-hoc queries with any of these 3 companies is when I broke Google BigQuery doing a many-to-many relationship query upon a billion records. But the results at some point was in the trillions before it aggregates down to a few dozen. So, it's understandable how I crashed their engine as it's not a usual use case to run into.

  • It's what xsevensinzx said, but also there's likely some cheating. Pre-fetch, caching, likely some pre-calcs or other tricks that simulate certain things being really fast to the user. Trading space for time, is something we do often.

  • What's essential is knowing what data access patterns are being used by each application and then partitioning, structuring, and indexing the data optimally. For ad-hoc aggregate reporting on millions or billions of records, a ColumnStore table like SQL Server DW, Hbase, or Amazon Redshift. For something like an online banking app, each user accesses a wide range of data elements but only for their personal slice of the pie. What you need is a near realtime datamart, maybe something like Cosmos DB or MongoDB, where all the recent transactional and profile data for a customer is contained in something like a JSON document which can be fetched within 10 ms.

    RDMS engines like SQL Server are good for transactional read/write applications and for a highly constrained single version of truth, but even in the hands of a expert level SQL coder, it's only so-so when it comes to TB scale aggregate reporting and high volume queries. If you're willing to work outside the box, there are better options for the use cases described above.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I've worked in extreme environments before, but not on the scale that Steve is talking about.

    1. I worked for a medical private practice that had only one doctor; that person has an extreme temper. I think everyone that worked in the office was on Prozac. I missed the chair throwing incident where a nurse hurled a chair at the doctor. The average tenure for developers was about six months. How I made it through two years, I don't know. Sometimes, I would throw up as soon as I got out of the car at home.
    2. I left that job to work for a dot-bomb. The developers were great. We would head out to lunch on a daily basis. We would have occasional stress relievers through rubber band fights. But the client kept changing the requirements for the software making the final product a moving target with no scheduled end of completion. A new project was deemed urgent, which meant longer hours and weekend work. The developers were great; management just sucked at client management and project management.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply