Go to content

ScalaIO - Yann Schwartz & Sofian Djamaa - Stream all the things! - life at the end of the firehose

Even with one of the largest Hadoop clusters in Europe, there comes a time when your data cries to be processed online too, because you'd love low latency or because there's simply too much data for your storage or network capacity. So at Criteo we've transitioned to an hybrid batch/streaming architecture, where Hadoop plays nice with Kafka, Storm, and lately SummingBird a bridge between the two approaches. In this session we'll cover our architecture, the tools it's made of, the interesting gotchas we've run into and how everything - from business events to application logs and system metrics - become precious data in a streaming world.

October 23, 2014