Go to content

Andreas Dewes - Analyzing Data with Python & Docker

Andreas Dewes - Analyzing Data with Python & Docker [EuroPython 2016] [21 July 2016] [Bilbao, Euskadi, Spain] (https://ep2016.europython.eu//conference/talks/analyzing-data-with-python-docker) Docker is a powerful tool for packaging software and services in containers and running them on a virtual infrastructure. Python is a very powerful language for data analysis. What happens if we combine the two? We get a very versatile and robust system for analyzing data at small and large scale! I will show how we can make use of Python and Docker to build repeatable, robust data analysis workflows which can be used in many different contexts (possibly with a live demo). ----- Docker is a powerful tool for packaging software and services in containers and running them on a virtual infrastructure. Python is a very powerful language for data analysis. What happens if we combine the two? We get a very versatile and robust system for analyzing data at small and large scale! I will show how we can make use of Python and Docker to build repeatable, robust data analysis workflows that can be used in many different contexts. I will explain the core ideas behind Docker and show how they can be useful in data analysis. I will then discuss an open-source Python library (Rouster) which uses the Python Docker-API to analyze data in containers and show several interesting use cases (possibly even a live-demo). Outline: 1. Why data analysis can be frustrating: Managing software, dependencies, data versions, workflows 2. How Docker can help us to make data analysis easier & more reproducible 3. Introducing Rouster: Building data analysis workflows with Python and Docker 4. Examples of data analysis workflows: Business Intelligence, Scientific Data Analysis, Interactive Exploration of Data 5. Future Directions & Outlook

July 17, 2016