Cassandra Exports, a Trivially Parallelizable Problem (Emilio Del Tessandoro, Spotify) C*Summit 2016
Slides: http://www.slideshare.net/DataStax/cassandra-exports-as-a-trivially-parallelizable-problem | Cassandra databases at Spotify hold all sorts of interesting data sets. Quite obviously, we would like to allow our data scientists tap these data sets. Recent developments in the offerings of cloud vendors allowed us to engineer systems that answer this use case in an unprecedented way. In this talk we'll present how we turned the process of exporting data from Cassandra clusters into a trivially parallelizible problem. Using just a few basic cloud products we've managed to dump our largest clusters containing terabytes of data in the order of minutes. About the Speaker Emilio Del Tessandoro Software Engineer, Spotify Emilio Del Tessandoro is a software engineer working on tooling and automation for the Spotify storage infrastructure. He is interested in theoretical computer science with a focus on algorithms and scalable systems.