Help us!

Take some time to transcribe PyCon 2014 talks! Click on the "Share" button below the video and then "Subtitle" to get started.

Web scraping: Reliably and efficiently pull data from pages that don't expect it

Summary

Exciting information is trapped in web pages and behind HTML forms. In this tutorial, you'll learn how to parse those pages and when to apply advanced techniques that make scraping faster and more stable. We'll cover parallel downloading with Twisted, gevent, and others; analyzing sites behind SSL; driving JavaScript-y sites with Selenium; and evading common anti-scraping techniques.