SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Tuning Throughput from Azure Event Hub to Azure Stream Analytics

Recently I had a requirement to load streaming JSON data to provide a data feed for near real-time reporting.  The solution streamed data into an “Ingress” Azure Event Hub, shred the JSON via Azure Stream Analytics and then push subsections of data as micro-batches (1 sec) into a “Egress” Azure Event Hub (for loading into a stage table in Azure SQL DW).

In Event Hubs and Stream Analytics there are only a few performance levers to help tune a solution like this, or said another way, doing nothing with these levers can affect your ongoing performance!

So this blog is to show the performance differences when using different Azure Event Hub partition configurations and the Azure Stream Analytics PARTITION BY clause.

Understanding the Environment Levers

When you configure Azure Event Hub you only have 2 levers;

  • Throughput Units (TU) – TU’s apply to all event hubs in a namespace and are pre-purchased units of capacity between 1 and 20 (or more by contacting support). Each partition has a maximum scale of 1 TU.  Each TU is 1MB/sec ingress + 2MB/sec egress + 84GB event storage.  For our test we wont change TU
  • Partitions – The number of partitions is specified at creation and must be between 2 and 32. The partition count is not changeable. Partitions are a data organization mechanism that relates to the downstream parallelism required when consuming data.
  • You can learn more about Event Hubs herehttps://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-what-is-event-hubs

When you configure Azure Stream Analytics you only have 2 levers;

  • Streaming Units (SU) – Each SU is a blend of compute, memory and throughput between 1 and 48 (or more by contacting support).  The factors that impact SU are query complexity, latency, and volume of data. SU can be used to scale out a job to achieve higher throughput. Depending on query complexity and throughput required, more SU units may be necessary to achieve your performance requirements.  A level of SU6 assigns an entire Stream Analytics node.   For our test we wont change SU
  • SQL Query Design – Queries are expressed in a SQL-like query language. These queries are documented in the query language reference guide and includes several common query patterns.  The design of the query can greatly affect the job throughput, in particular if and/or how the PARTITION BY clause is used.
  • You can learn more about Stream Analytics herehttps://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-introduction

 

Test Patterns and Scenarios

When running streaming solutions its important to aim to get maximum parallelism whch allows the greatest throughput.

I’m going to keep it simple and verify the difference between running ingress and egress Event Hub with different partition settings, and combine a Stream Analytics SQL query design using the PARTITION BY clause.

 

TEST INPUT

[Ingress AEH]

QUERY

[ASA]

OUTPUT

[Egress AEH]

TU Partitions SU SQL Query TU Partitions
1 5 2 1 NONE 5 16
2 5 2 1 PARTITION BY 5 16
3 5 16 1 NONE 5 2
4 5 16 1 PARTITION BY 5 2
5 5 2 1 NONE 5 2
6 5 2 1 PARTITION BY 5 2
7 5 16 1 NONE 5 16
8 5 16 1 PARTITION BY 5 16

 

Test Workload Configuration

The particulars of the test workload are as follows;

  • Each test will send 16000 events using 16 clients/threads with no wait time between each send event.
  • The JSON event is 88 bytes, and is shown below.  Each event will have an incremental event id, a random deviceId (1-10) and random value (1-1000000).
  • The events are not compressed
  • Each client will send data without using a partition key (ie round robin will allocate the incoming events into the Ingress Event Hub partitions).
  • The send clients will run from a DS13-4_v2 (4 v-cpus, 56 GB memory) Azure VM to minimise latency.
  • The VM is running in Australia South East.
  • The Event Hubs and Stream Analytics services are running in Australia South East.
  • Stream Analytics will run at SU1 (just so its easy to observe SU% differences)
  • Event Hub will run at TU5 (Igress Hub & Egress hub share the same namespace)
  • The ASA SQL query will perform a relatively complex aggregate tumbling window
  • I will use the awesomely cool Service Bus Explorer to send the events.  In case you haven’t seen it before, then strongly recommend to check it out – https://code.msdn.microsoft.com/windowsapps/Service-Bus-Explorer-f2abca5a
  • The tests will measure the SU% and the elapsed sec (first event sent [from the client] to last event received [in the event hub]).

 

Test JSON Event Message

{
    "eventId":#####,
    "deviceId":##,
    "value":#######,
    "timestamp":"####-##-##T##:##:##.#######Z"
}

 

Test Stream Analytics Query

My ASA SQL query below can use a “parallel” input step from the ingress event hub, but I needed an aggregated output step to load into the egress event hub.  I know I cannot achieve “maximum” parallelism for the query, however I wanted to identify if there are still observable performance benefits in this configuration.

For each test I will change the ASA SQL Query as follows…

  • Add / remove the “PARTITION BY” clause
  • Swap the pointers for “aAEHIngress##” and “aAEHEgress##” Event Hubs.  I have pre-configured several event hubs with either 2 or 16 partitions, so I can just swap the pointers around and rerun the query again!
WITH Step1 AS
(
   SELECT
     PartitionId,
     deviceId,
     COUNT(deviceId) as TOTRows,
     SUM(value) as SUMValue,
     AVG(value) as AVGValue,
     MIN(value) as MINValue,
     MAX(value) as MAXValue
     FROM
       [aAEHIngress##] --PARTITION BY PartitionId
     GROUP BY
       PartitionId,
       deviceId,
       TumblingWindow(second, 1)
)
SELECT deviceId,
   SUM(TOTRows) as TOTRows,
   SUM(SUMValue) as SUMValue,
   AVG(AVGValue) as AVGValue,
   MIN(MINValue) as MINValue,
   MAX(MAXValue) as MAXValue
INTO
   [aAEHEgress##]
FROM
   Step1
GROUP BY
   deviceId,
   TumblingWindow(second, 1)

If you are interested in other common ASA queries then check out the patterns here – https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-stream-analytics-query-patterns

 

Test Results/Outcomes

And so the results are in!

# Test Type
Total Events
Sent
Avg Event Send
Rate / Sec
Elapsed
(sec)
ASA
SU %
1 2:16:N 16000 201 84 9
2 2:16:P 16000 202 83 9
3 16:2:N 16000 174 102 11
4 16:2:P 16000 167 108 14
5 2:2:N 16000 198 85 10
6 2:2:P 16000 200 83 9
7 16:16:N 16000 183 90 11
8 16:16:P 16000 183 98 14

 

Key Observations of this Test Workload

Given the above tests and observed results, you could conclude that…

  1. Mismatching ingress and egress partitions can negatively affect performance specifically when the ingress partitions is higher than the egress partitions (ie the IN pipe is bigger than the OUT pipe).  My test showed as much as a 20% difference in elapsed time.  This also results in a 30% higher SU% utilisation.
  2. Mismatching egress and ingress partitions does not seem to negatively affect performance when the egress partitions are higher than the ingress partitions (ie the IN pipe is smaller than the OUT pipe).  My test shows the elapsed times and SU% are about the same, which kind of makes sense.
  3. Matching ingress and egress partitions can positively affect performance, however when higher partitions are used (16 partitions) there seems to be a degree of overhead to process data through the solution, as is shown by lower send rate, and higher SU%.  This could be related to send overhead at the client – I note that MS DOCS states 4 partitions is “generally enough” for most solutions.  Probably needs further investigation and retesting at 4 or 8 partitions…
  4. There was only a very small benefit from using the PARTITION BY in my specific ASA Query.  Some performed better by a few seconds, and others worse.  Its probably not enough to call it as beneficial or not.  As above I suspected this may occur as my ASA SQL query needed 2 steps; the 1st could be partitioned, however the 2nd couldn’t be partitioned (GROUP BY without partition id).

 

Summary

So there you have it, a pretty simple breakdown of how the different partition sizes in Azure Event Hubs and partition by clasue in Azure Stream Anaytics can change the performance!

My test was focused on my specific needs in my specific scenario; the outcomes will differ depending on the (a) SQL query, (b) the type/size of the event payload, (c) the number of clients sending data into the ingress event hub, and (d) if those clients send to a specific partition or use round robin.  Ideally if I had time it would be great to rerun this using a workload of a few million events over a few hundred clients!

So as usual, and as I always say, please test this out yourself with your own solution, data and event stream as your milage may vary!

 

References

There’s some exceptional MS DOCS articles I recommend that you review and validate the fit within your own streaming data solutions.


Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here


Filed under: Azure, Azure Event Hub, Azure Stream Analytics, IoT Tagged: SQL Server

Mr. Fox SQL

Rolf Tesmer works as an Azure Data Solution Architect (DSA) in Australia for Microsoft. Rolf has an MCSE in Data Management & Analytics, an MCSE in Data Platform and an MCSE in Business Intelligence (BI). Rolf has been working with the SQL data platform since v6.0 (that’s 1994!) and has done just about everything you can around data related platforms, solutions and architectures ever since then and has scoped, designed and delivered 100’s of data solutions in that time. Rolf has had the opportunity to present extensively at Ignite, PASS, TechEd, SQL Saturday, SQL User Groups, MeetUp’s and Seminars, Roadshows, etc and really enjoys sharing and learning new ideas.

Comments

Leave a comment on the original post [mrfoxsql.wordpress.com, opens in a new window]

Loading comments...