Data Exploration Made Easy with Open Source (Oz KATZ)
Voxxed Days Luxembourg 2023 Room: AmigaOS Type: Tools in Action Exploring data is HARD! I bet you’ve often asked yourselves how can I possibly run SQL queries on Parquet and other tabular formats inside my own data lake? While running a modern data lake certainly has a lot of upside, with all the benefits, it’s pretty obvious that object stores (the bedrock of every cloud data lake) were never designed to be Data Warehouses. Sometimes all we need is to be able to explore our data, look at its schema, compare versions and more, ideally without having to use any additional tools or install additional components. This is where we can leverage the power of open source. In this talk I’ll share how open source lakeFS embedded DuckDB to enable just this kind of experience, natively from within the lakeFS UI. By leveraging DuckDB data engineers achieve simple, performant ways to explore data without having to run expensive and complex distributed systems, all within their same workflow and experience. In this talk you’ll learn about the benefits of leveraging the power of DuckDB within lakeFS––what this looks like in practice, and why you should try this at home.