Analyzing Data in Real Time

  • Comments posted to this topic are about the item Analyzing Data in Real Time

  • When I hear real time, I think in seconds or less. I don't think in terms of minutes and hours. That's why I too would not think of OLAP cubes in real time because they still have to process on a single machine and in batch.

    There are a lot of technologies out there that are trying to do real time outside of the traditional RDBMS. I've used one with Apache Storm with Hadoop. Creating topologies (think SSIS package) that contained two base objects -- a spout (think data connector to API) and bolts (think standard procedure).

    Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

    These objects are written in Java, but I used a Python framework that allowed me to create a spout that streamed Twitter feeds to bolts that were the core of my ETL that transformed the data. This is all done in realtime where the output from the bolts is dumped into HBase.

    The cool thing is these spouts and bolts can be distributed across multiple nodes. This allows for distributed processing of real-time data. This is ideally the major advantage over something like SQL Server with SSAS.

    Here is some further reading on using Storm with Azure to analyse sensor data.

    Analyze sensor data with Apache Storm, Event Hub, and HBase in HDInsight (Hadoop)[/url]

  • I didn't know 24 hour old data in SSAS was real time analytics. That's going on the ol' resume!

  • From the Article:


    Of course, the definition of real time isn't really well known.

    The definition is, of course, very well defined...

    https://www.google.com/?gws_rd=ssl#q=definition+of+real+time

    ... and anyone that thinks or says otherwise is why real Engineers laugh at us especially when they see definitions as to how the human perceives what "real time" is. How anyone can think that something like a 5 second delay, never mind a totally insane inclusion of 24 hours, is even close to being "Real Time"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My definition of "real time" is somewhat different and can vary greatly from place to place.

    It's the time I have before the business needs the information.

    If they (business) get their info when they need it, then it's real time for them.

    A lot of background processes does not needs up to the seconds data.

    If business still insist for faster response time (for the fun of it, it occurs a lot), the exponential scale of cost versus "readiness" is a great help to me to get an acceptable compromise.

    It would be more accurate to say "ready time" instead of "real time"

  • Jeff Moden (7/24/2016)


    From the Article:


    Of course, the definition of real time isn't really well known.

    The definition is, of course, very well defined...

    https://www.google.com/?gws_rd=ssl#q=definition+of+real+time

    ... and anyone that thinks or says otherwise is why real Engineers laugh at us especially when they see definitions as to how the human perceives what "real time" is. How anyone can think that something like a 5 second delay, never mind a totally insane inclusion of 24 hours, is even close to being "Real Time"?

    I like how there is an exception for computing because it's likely going to be pretty impossible to get the actual time with scale. But yeah, that's what I view as real-time as well. Soon as it's generate, you start snagging it. Hours behind is not real-time.

  • Jeff Moden (7/24/2016)


    From the Article:


    Of course, the definition of real time isn't really well known.

    The definition is, of course, very well defined...

    https://www.google.com/?gws_rd=ssl#q=definition+of+real+time

    ... and anyone that thinks or says otherwise is why real Engineers laugh at us especially when they see definitions as to how the human perceives what "real time" is. How anyone can think that something like a 5 second delay, never mind a totally insane inclusion of 24 hours, is even close to being "Real Time"?

    That's a definition, and for some systems. For business systems, I think real-time can vary, and certainly will with BI type systems.

  • xsevensinzx (7/25/2016)


    Jeff Moden (7/24/2016)


    From the Article:


    Of course, the definition of real time isn't really well known.

    The definition is, of course, very well defined...

    https://www.google.com/?gws_rd=ssl#q=definition+of+real+time

    ... and anyone that thinks or says otherwise is why real Engineers laugh at us especially when they see definitions as to how the human perceives what "real time" is. How anyone can think that something like a 5 second delay, never mind a totally insane inclusion of 24 hours, is even close to being "Real Time"?

    I like how there is an exception for computing because it's likely going to be pretty impossible to get the actual time with scale. But yeah, that's what I view as real-time as well. Soon as it's generate, you start snagging it. Hours behind is not real-time.

    I tend to agree with this. I think real-time needs to be seconds, perhaps low minutes, but otherwise it's not really something that you can respond to as it's occurring.

    I'll point out that there might be times, maybe most times, that we don't want to respond this quickly.

  • Jeff Moden (7/24/2016)


    From the Article:


    Of course, the definition of real time isn't really well known.

    The definition is, of course, very well defined...

    https://www.google.com/?gws_rd=ssl#q=definition+of+real+time

    ... and anyone that thinks or says otherwise is why real Engineers laugh at us especially when they see definitions as to how the human perceives what "real time" is. How anyone can think that something like a 5 second delay, never mind a totally insane inclusion of 24 hours, is even close to being "Real Time"?

    Well, I'm a real Engineer and I use two definitions of "real time"; one is about simulations: a real time simulation is one in which the speed of the simulation is the same as the speed of the thing being simulated (so that if the simulation shows event x as being 10 secs behind event y, that's the time between the coorecponding real time events); the other is about meeting commited response, whether interactive or not, and it simply says that the system is real time is it changes it's state as a result of stimuli within a defined times required which may depend on the stimulus type. So something which takes note of transactions in at most one year after the transactions take place is real time provided the defined time required for transaction stimuli is one year. I'm more used to seeing it in user interfaces (where 10 secs might be the required time, or maybe 30) or in process/machine control systems where the required time might be a few milliseconds (or even less) but it's absolute nonsense toclaim that a requirement of 24 hours is not a "real time" requirement if that's the requirement the system actually has.

    Tom

  • TomThomson (7/25/2016)


    Jeff Moden (7/24/2016)


    From the Article:


    Of course, the definition of real time isn't really well known.

    The definition is, of course, very well defined...

    https://www.google.com/?gws_rd=ssl#q=definition+of+real+time

    ... and anyone that thinks or says otherwise is why real Engineers laugh at us especially when they see definitions as to how the human perceives what "real time" is. How anyone can think that something like a 5 second delay, never mind a totally insane inclusion of 24 hours, is even close to being "Real Time"?

    Well, I'm a real Engineer and I use two definitions of "real time"; one is about simulations: a real time simulation is one in which the speed of the simulation is the same as the speed of the thing being simulated (so that if the simulation shows event x as being 10 secs behind event y, that's the time between the coorecponding real time events); the other is about meeting commited response, whether interactive or not, and it simply says that the system is real time is it changes it's state as a result of stimuli within a defined times required which may depend on the stimulus type. So something which takes note of transactions in at most one year after the transactions take place is real time provided the defined time required for transaction stimuli is one year. I'm more used to seeing it in user interfaces (where 10 secs might be the required time, or maybe 30) or in process/machine control systems where the required time might be a few milliseconds (or even less) but it's absolute nonsense toclaim that a requirement of 24 hours is not a "real time" requirement if that's the requirement the system actually has.

    Gosh... requirement or not, I wouldn't call a 24 hour delay "real time" even in a non-simulation scenario. I might call that "meets specified requirements", "good enough", "fast enough", "within specified parameters", or "meets the SLA", but I wouldn't call it "real time" by any stretch of the imagination even if I were an Engineer with understanding users that would accept such a thing. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I am with Jeff on this one. Even if you do not stick to the strict definition real time refers to immediate response to stimuli where immediate infers that there is no mechanism, beyond buffering in volatile memory, delaying processing. A long time ago I came across the term near time processing which is probably subverted from near real time (see https://en.wikipedia.org/wiki/Real-time_computing#Near_real-time) and added the use of buffering in non-volatile memory.

    For me the use in real time analytics is a hijack of terms for marketing purposes. Quelle surprise!!!

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • I primarily work on production systems and databases that can be considered as 'real-time'.

    Sub-second data logging of critical production values like temperatures and pressures are very common and regulations might require this data to be stored for 5 years or more.

    However, real-time processing is not always required, but scenario's like that sometimes needs 'out-of-the-box' thinking!

    The closer the logging of data can be to the source, the better.

    Data can always be inserted and aggregated into other databases /data warehouses by means of ETL techniques at a later stage.

    Do not underestimate the power of CSV files and BULK INSERT!

  • Stefan LG (7/28/2016)


    Do not underestimate the power of CSV files and BULK INSERT!

    BOY HOWDY!!! +1000

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/28/2016)


    Stefan LG (7/28/2016)


    Do not underestimate the power of CSV files and BULK INSERT!

    BOY HOWDY!!! +1000

    Indeed. Simple is often the best way to transform things. I like this over SSIS in many cases. Just easier to run, debug, understand. Plus I find that it's easier to do lots of bulk cleans in T-SQL.

  • I remember when ATMs first came out they were downloaded nightly. One user just went to all the ATMs and overdrew his account.

    It does sound like a marketing term.

    412-977-3526 call/text

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply