Go to content

Juan Riaza - Dive into Scrapy

Juan Riaza - Dive into Scrapy [EuroPython 2015] [21 July 2015] [Bilbao, Euskadi, Spain] Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. In this talk some advanced techniques will be shown based on how Scrapy is used at Scrapinghub. Goals: - Understand why its necessary to _Scrapy-ify_ early on. - Anatomy of a Scrapy Spider. - Using the interactive shell. - What are items and how to use item loaders. - Examples of pipelines and middlewares. - Techniques to avoid getting banned. - How to deploy Scrapy projects.

July 20, 2015