Stream Data Processing for Fun and Profit - David Ostrovsky
Software systems today have to handle increasingly large streams of incoming data, whether it’s user interactions with a web page, events generated by sensors, or messages sent by different system components. Oftentimes, this data loses value over time, becomes irrelevant or stale, which makes handling events quickly and reliably not just desirable, but critical. However, building a system that can process hundreds of thousands, or millions, of events per second without compromising either speed or reliability is a major engineering challenge. This is where distributed data processing frameworks come in. In this session, we’ll talk about stream data processing: where it originated, how it works, when to use it, how to build robust stream processing applications, and the tools available to us. We’ll examine the two most popular platforms used in the industry today: Apache Storm and Apache Spark, as well as some interesting up-and-coming frameworks like Flink, Kafka Streams, Apex and Microsoft Orleans Streams. Each uses a conceptually different approach, has a plethora of features, and works (or doesn’t) best for different use-cases. Understanding how and when to use which streaming data framework is key to building a reliable, scalable, robust system and avoiding painful and costly redesigns down the road.