Wes McKinney & Jon Keane - Apache Arrow in an Interconnected World
Apache Arrow in an Interconnected World by Wes McKinney & Jon Keane Visit https://rstats.ai/nyr/ to learn more. Abstract: Apache Arrow is a multi-language toolbox for accelerated data interchange and processing. Community-driven development in the Arrow project has continued at a fast pace, with numerous new capabilities, features and refinements added over the past year—all with the goal of making data interchange and processing easier, faster, and more interoperable. The Arrow format has also been adopted as a high-performance (and zero copy!) method of interchange from one toolkit or framework to another, making transitioning from one to another easy and quick. We’ll take a tour of some of the new features in Apache Arrow as well as examples of using the zero-copy data interface between Arrow R and other toolkits like DuckDB. The Arrow R package brings the Apache Arrow toolkit to anyone using R, providing access to the Arrow C++ library with a familiar dplyr and R interface. Bio: Wes McKinney is an open source software developer focusing on analytical computing. He created the Python pandas project and is a co-creator of Apache Arrow, his current focus. He authored two editions of the book Python for Data Analysis. Wes is a member of The Apache Software Foundation and also a PMC member for Apache Parquet. He is now the CTO of Voltron Data, a new startup working on accelerated computing technologies powered by Apache Arrow. Twitter: https://twitter.com/wesmckinn Twitter: https://twitter.com/jonkeane Presented at the 2022 New York R Conference (June 10, 2022)