Go to content

Crawling The Web With Elixir - Adam Mokan

Crawling the web is something that a large number of people do, but few people really want to talk about. I feel like there is not enough knowledge sharing on this topic, and I want to share my experiences over the past decade crawling at scale. We will look at how I used Elixir to orchestrate a pool of distributed, dynamic headless crawler nodes and go over the things I got wrong, how I resolved them, and more. Even if you have no interest in crawling the web, I've learned over the years that knowledge of how to crawl the web in a resilient manner shares a number of overlapping similarities to large-scale data integration with 3rd party APIs. General awareness of OTP, GenStage, distributed systems, headless browser APIs, and Amazon Web Services are a plus. Bio Adam has been working on the BEAM since 2012 and over the past few years has been implementing large-scale crawlers and data processing pipelines with Elixir.

August 27, 2019