meta_pixel
Tapesearch Logo
Log in
Talk Python To Me

#283: Web scraping, the 2020 edition

Talk Python To Me

Michael Kennedy

Technology

4.8635 Ratings

🗓️ 23 September 2020

⏱️ 49 minutes

🧾️ Download transcript

Summary

Web scraping is pulling the HTML of a website down and parsing useful data out of it. The use-cases for this type of functionality are endless. Have a bunch of data on governmental sites that are only listed online in HTML without a download? There's an API for that! Do you want to keep abreast of what your competitors are featuring on their site? There's an API for that. Need alerts for changes on a website, for example enrollment is now open at your college and you want to be first to get in and avoid the 8am Monday morning course slot? There's an API for that.

Transcript

Click on a timestamp to play from that location

0:00.0

Web scraping is pulling the HTML of a website down and parsing useful data out of it.

0:05.1

The use cases for this type of functionality are endless.

0:08.5

Have a bunch of data on governmental sites that are only listed online in HTML without a

0:13.4

download? There's an API for that. Do you want to keep abreast of what your competitors

0:17.6

are featuring on their site? There's an API for that.

0:23.0

Need alerts for changes on our website.

0:25.9

For example, enrollment is now open at your college,

0:29.5

and you want to be first and avoid that 8 a.m. morning slot.

0:31.2

There's an API for that as well.

0:33.1

That API is Screen Scraping,

0:36.0

and Attila Tooth from Scraping Hub is here to tell us all about it.

0:37.4

This is Talk Python, Me. Episode 283 recorded July 22nd, 2020.

1:00.8

Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

1:02.8

This is your host, Michael Kennedy.

1:04.9

Follow me on Twitter where I'm at M. Kennedy.

1:08.6

Keep up with a show and listen to past episodes at TalkPython.fm.

1:11.1

And follow the show on Twitter via at Talk Python.

1:14.9

This episode is brought to you by Linode and Us.

1:21.2

Python's async and parallel programming support is highly underrated. Have you shied away from the amazing new async and await keywords because you've heard it's way too

1:24.7

complicated or that it's just not worth the effort.

1:32.9

With the right workloads, a hundred times speed up is totally possible with minor changes to your code.

1:38.8

But you do need to understand the internals, and that's why our course, Async techniques and examples in Python,

...

Please login to see the full transcript.

Disclaimer: The podcast and artwork embedded on this page are from Michael Kennedy, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Michael Kennedy and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.

Copyright © Tapesearch 2025.