Go to content

David Robinson - The {widyr} Package

The {widyr} Package: Calculate Pairwise Correlations, Clustering, and More within a Tidy Workflow by Dr. David Robinson. Visit https://rstats.ai/nyr/ to learn more. Abstract: The tidyverse offers powerful and usable tools for transformation, visualization, and modeling, based on the table as a shared data structure. Some data science tasks, however, are conceptually and computationally suited to “wide” matrices rather than “tidy” tables. Such operations include pairwise correlations, clustering, and dimensionality reduction, and generally any operation that compares across groups rather than within groups. In this talk I’ll introduce my widyr package, which fits those operations into the tidyverse by making the matrix operations invisible to the user. The package offers functions such as pairwise_count(), pairwise_cor(), and widely_svd(), each of which takes and returns a table. These functions are efficient but powerful, letting users answer questions like “which groups within this dataset are correlated” or “what are the most important principal components” as part of a tidy workflow. I’ll describe the widyr philosophy, share some examples of using widyr in a tidy analysis, and end with some glimpses of the future of the widyr package. Bio: David Robinson is the Principal Data Scientist at Heap. David earned a Ph.D. in Quantitative and Computational Biology from Princeton University. He was the Data Insights Engineering Manager at Flatiron Health until early 2020. He was previously the Chief Data Scientist at Datacamp, and, before that, a Data Scientist at Stack Overflow, where he analyzed data on the world's software developers to help them find answers to their programming questions. He is the co-author with Julia Silge of the tidytext package and of the book Text Mining with R. He is also the author of the broom, gganimate, and fuzzyjoin packages and of the DataCamp course "Exploratory Data Analysis in R: Case Study." He writes about R, statistics and education on his blog Variance Explained, as well as on Twitter as @drob. Twitter: https://twitter.com/drob Presented at the 2020 R Conference | New York (August 15th, 2020)

April 12, 2020