I've heard of Kafka before. I know it's an Apache project and you can download or read more at https://kafka.apache.org/. I knew it was a way of moving data around, some sort of ETL tool useful for moving things around. More like a message and queueing system, which is a tool that seems like a great idea, but one that everyone struggles to work with.
And one that seemed complex. The overview is that Kafka is "a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. It can be deployed on bare-metal hardware, virtual machines, and containers in on-premise as well as cloud environments."
Would I need that or use it? In a lot of my database work, I'm not sure that it would easily fit into most of the OLTP applications or data warehouse systems. Maybe. Hard to tell. Their description of event streaming and the definition of an event make it seem this is a catch-all system for moving log data around. One that be so open-ended that it ends up requiring a lot of configuration for "my" system.
Here's their definition of an event: An event records the fact that "something happened" in the world or in your business. It is also called record or message in the documentation. When you read or write data to Kafka, you do this in the form of events. Conceptually, an event has a key, value, timestamp, and optional metadata headers.
Recently I watched a Kafka presentation at THAT Conference (which was a fantastic event). In the talk, this sentence caught my eye: "[Kafka is] a pipe to move data from A to B, C, D". I've certainly had that need, and sometimes configuring lots of pipes is work. If you've ever worked with replication and the publisher/subscriber model you likely get a twitch in your eye if a ticket is opened to configure a new subscriber. Not because the configuration is hard, but because the ongoing admin can be a pain.
The talk dives into some of the complexity of designing and implementing a Kafka system. For developers that might write to the stream or read from it, things seem simple. For admins and architects, less so, and I can't help what happens when a reader goes down. I have nightmares of replication subscribers being down and transaction logs not being reused.
Kafka doesn't seem as complex as I thought before, but it certainly doesn't seem simple or easy. Kafka is not a panacea for moving data around, but it is a well-understood and widely used technology. Those things mean more to me now that I find myself considering the challenges of maintaining a system over time and hiring staff who understand it. It's something I'd consider using in the future, and maybe something I'd like to experiment with a bit more and learn how it works at a more practical level.
If you use it, or know more, I'd be interested in how well Kafka has worked for you, either as a developer or admin.
