Mathieu Bastian - 365 days of Spark!
Further information: https://berlinbuzzwords.de/17/session/365-days-spark The first 365 days of a relationship are full of discoveries, learnings and lessons! And that's also true with significant technological shifts. Apache Spark has rapidly transformed the data platform landscape, and has recently reached a 2.0 version, completing an important development cycle. But how to really succeed at rolling Spark into an organization? How to support various use-cases, from machine learning and data products all the way to analytics or reporting? And how to do that step by step, gracefully and efficiently? I would like to share my experience and learning from a first and intense year moving Spark into production at GetYourGuide, a Berlin startup of 250+ employees in the travel space, with lots of data to analyse and crunch. Whether the starting point is a legacy data warehouse, a Hadoop infrastructure or anything in between, the ""playbook"" I would like to present aims to touch on the following important questions and topics: How to support SQL-based use-cases, as well as complex machine learning on one platform How to integrate an existing data warehouse into the rollout strategy Which platform to choose, and what needed resources to expect? Can everything be done with the Dataset API, are RDD still relevant? How to get started with streaming? How to best organize your data, moving from relational tables to a data lake on S3 How to integrate with existing BI tools? (e.g. Looker) This presentation will be informative for whoever is currently planning or executing a migration to Spark. It will highlight intermediate technical topics for newcomers, as well as tips and advices to have a successfull first year relationship with Spark! Speaker: Mathieu Bastian https://twitter.com/mathieubastian