Build Secure and Governed Microservices with Kafka Streams
January 29th, 2019
In this post, we shall discuss how Kafka Streams can be used to build microservices application with the help of a use case.
What is Apache Kafka?
Apache Kafka is a community distributed streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log. Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged streaming platform.
A streaming platform would not be complete without the ability to manipulate that data as it arrives. The Streams API within Apache Kafka is a powerful, lightweight library that allows for on-the-fly processing, letting you aggregate, create windowing parameters, perform joins of data within a stream, and more. Perhaps best of all, it is built as a Java application on top of Kafka, keeping your workflow intact with no extra clusters to maintain.
Kafka Streams, a component of open source Apache Kafka. Kafka Streams is a powerful, easy-to-use library for building highly scalable, fault-tolerant, distributed stream processing applications on top of Apache Kafka. It builds upon important concepts for stream processing such as properly distinguishing between event-time and processing-time, handling of late-arriving data, and efficient management of application state.
The following list highlights several key capabilities and aspects of Kafka Streams that make it a compelling choice for building use cases such as stream processing applications, continuous queries and transformations, and microservices.
- Highly scalable, elastic, fault-tolerant
- Stateful and stateless processing
- Event-time processing with windowing, joins, aggregations
- No dedicated cluster required
- No external dependencies
- “It’s a library, not a framework.”
- 100% compatible with Kafka 0.10.0.0
- Easy to integrate into existing applications
- No artificial rules for deploying applications
- Millisecond processing latency
- Does not micro-batch messages
- Windowing with out-of-order data
- Allows for arrival of late data
Use Case – A Car Fleet Company
A car fleet company has their fleet of cars with sensors. The geo-event sensor captures important events from the truck (e.g. lane change, breaking, start/stop, acceleration along with its geolocation) and the speed sensor captures the speed of the truck at different intervals. These two sensors are streaming data into their own respective kafka topics: syndicate-geo-event-avro and syndicate-speed-event-avro.
The requirements for the use case are the following:
- Create streams consuming from the two Kafka topics
- Join the streams of Geo & Speed sensors over a time based aggregate window.
- Apply rules on the stream to filter on events of interest.
- Calculate the average speed of driver over 3 minute window and create alert for speeding driver.
- Find all the drivers who have been speeding (> 80) over that 3 minute window.
- Send alerts for speeding drivers to downstream alert topic.
- Apply access control (ACL) to the source of Kafka topics, the alert topic and intermediate topics that are created by Kafka stream apps
- Monitor each Microservice providing a view into producers, consumers, brokers and key metrics like consumer group lag, etc.
Kafka Streams Microservices Architecture.
Using Kafka Streams, we can implement these requirements with set of light-weight microservices that are highly decoupled and independently scalable.
Join Support in Kafka Streams & Integration with Schema Registry
Kafka Streams has rich support for joins and provides compositional simple APIs to do stream-to-stream joins and stream-to-table joins using the KStream and KTable abstractions.
The below diagram illustrates how JoinFilterGeoSpeed MicroService was implemented using the join support in Kafka Streams as well as integration with Schema Registry to deserialize the events from the source kafka topics.
The code for this microservice can be found here: JoinFilterMicroService.
Aggregation over Windows and Filtering Streams
Group records that have the same key for stateful operations such as joins or aggregations can be controlled into so called windows through windowing. Four types of windowing are supported in Kafka streams including: tumbling, hopping, sliding and session windows.
The below diagram illustrates how tumbling windows was used in CalculateDriverAvgSpeed MicroService to calculate the average speed of driver over a 3 minute window.
The code for this microservice can be found here: CalculateDriverAvgSpeedMicroService.
The third Microservice called AlertSpeedingDrivers filters the stream for drivers who are speeding over that three minute window.
The code for this microservice can be found here: AlertSpeedingDriversMicroService.java
Running & Scaling the Microservices without a Cluster
One of the key benefits of using Kafka Streams over other streaming engines is that the stream processing apps / microservices don’t need a cluster. Rather, each microservice can be run as a standalone app (e.g: jvm container). You can then spin multiple instances of each to scale up the microservice. Kafka will treat this as a single consumer group with multiple instances. Kafka streams takes care of consumer partition reassignments for scalability.
You can see how to start these three microservices here.
Secure & Auditable Microservices with Ranger, Ambari & Kerberos
The last two requirements for the trucking fleet application have to do with security and audit. For the JoinFilter MicroService, let’s distill these into the following more granular auth/authz requirements:
For Req #1, when we start up the microservice we configure the jvm parameter java.security.auth.login.config to point to the following jaas file. This jaas file contains the principal named truck_join_filter_microservice with its associated keytab that the microservice will use when connecting to kafka resources.
For Req # 2, 3 and 4, we use Ranger to configure the ACL policies.
For Req #5, Ranger provides audit services by indexing via Solr all access logs to kafka resources.
- Insider Threat
- A beginner’s guide to Blockchain
- NoSQL – High-performance, non relational database ...
- Leveraging Cloud for Disaster Recovery
- Application Performance Monitoring
- Cognitive Security AI Driven Cyber Security
- Introduction to Container Services
- Insider Threat Detection
- Build Secure and Governed Microservices with Kafka Streams
- Add and Manage photos in Outlook messages and contacts ...
- Security on a Budget
- About CodeTwo Email Signatures for Office 365
- Googles presence in China
- Check Point Software acquires Dome9 to beef up multi-cl ...
- Exploring the benefits and challenges of hyper converge ...
- Next Generation cloud backup and data protection for Of ...
- Backup for Office 365 with Code Two
- Email Security
- Cisco Issues Security Patch
- British Airways Hacked
- AutoML Vision
- Day 2 Keynote: Bringing the Cloud to You
- CI/CD in a Serverless World
- Keynote Google
- Google Cloud Next 2018 in Under 12 Minutes
- UAE Crowned as the most Digital Friendly Country
- Ransomware continues to prey on the UAE
- Chrome for all
- Machine Learning for a Future-Facing ZTS Revolution
- The Dawn of the Cloud
- Will Cryptocurrency Replace Conventional Currency
- Internet of Thing Under Attack
- Cloud Native Computing Transforming IT Infrastructure
- Cyber Security with Artificial Intelligence
- Understanding Cybersecurity at the Corporate level
- Cryptojacking on the rise
- Google discontinues Google Search Appliance (GSA)
- Secure cloud entry points with Google Chrome Enterprise
- Cloud Infrastructure to drive UAE Cloud Computing Market
- AI to contribute $320 billion USD to Middle East GDP by 2030
- Well begun for well being
- A Spin around the Space
- Oracle opens first innovation hub with a focus on AI
- AI to bring a world of opportunities to Dubai
- The BitCoin Revolution
- Annihilating to a Green Thought
- The Intelligent Move
- Looking Right at the Face of Facebook and Google