Spark 2.0 - by Matei Zaharia
This talk was recorded at Scala Days New York, 2016. Follow along on Twitter @scaladays and on the website for more information http://scaladays.org/. Abstract: Apache Spark is one of the largest Scala projects, as well as the most active open source project in big data processing. It was one of the first systems to provide a functional API in Scala and automatically distribute the work over clusters, simplifying distributed programming. In this keynote, I'll talk about how Spark's API has evolved since the first version, and in particular about new APIs in the soon to be released Spark 2.0. The largest additions have been to make the API more declarative, allowing richer automatic optimizations, and to provide a stronger link between Scala data types and a binary data format, enabling memory- and CPU-efficient processing. These ideas may be relevant to other high-performance libraries in Scala.