Radiolab seems to make their page out of javascript. that made it slightly more annoying to find the right download link, but otherwise it's straight forward.
import time
import requests
from bs4 import BeautifulSoup
import os.path
def download(href, title, extension="mp3", dirname='.'):
print(href, title, extension, dirname)
filename = "%s.%s" % (title, extension)
filename = filename.replace("/", "-")
# todo, path management
local_filename = os.path.join(dirname, filename)
if not os.path.exists(dirname):
print("making dir %s" % dirname)
os.makedirs(dirname)
local_filename = os.path.join(dirname, filename)
r = requests.get(href, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
return local_filename
archive_page = requests.get("http://www.radiolab.org/archive")
a_soup = BeautifulSoup(archive_page.content, 'html.parser')
for ep_row in a_soup.find_all('div', attrs={"class": "info-overlay"}):
ep_page_link = ep_row.find('a', attrs={"class": "read-more"})
print (ep_page_link.attrs['href'])
ep_page = requests.get(ep_page_link.attrs['href'])
soup = BeautifulSoup(ep_page.content, 'html.parser')
link = soup.find('div', attrs={"class": "player_element"})
href = link.attrs['data-download']
title = soup.find('h2', attrs={"class": "title"})
meta = soup.find('div', attrs={"class": "seanum-epnum"})
season, episode = meta.text.split("|")
season = season.replace("Season", "").strip()
episode = episode.replace("Episode", "").strip()
download(href, "%s. %s" % (episode, title), dirname="Season_%s" % season)
27th January 2018
I won't ever give out your email address. I don't publish comments but if you'd like to write to me then you could use this form.
I'm Issac. I live in Oakland. I make things for fun and money. I use electronics and computers and software. I manage teams and projects top to bottom. I've worked as a consultant, software engineer, hardware designer, artist, technology director and team lead. I do occasional fabrication in wood and plastic and metal. I run a boutique interactive agency with my brother Kasey and a roving cast of experts at Kelly Creative Tech. I was the Director of Technology for Nonchalance during the The Latitude Society project. I was the Lead Web Developer and then Technical Marketing Engineer at Nebula, which made an OpenStack Appliance. I've been building things on the web and in person since leaving Ohio State University's Electrical and Computer engineering program in 2007. Lots of other really dorky things happened to me before that, like dropping out of high school to go to university, getting an Eagle Scout award, and getting 6th in a state-wide algebra competition. I have an affinity for hopscotch.