urllib keeps freezing while trying to pull HTML data from a website - is my code correct?

Twiggy Garcia Source

I'm trying to build a simple Python script algorithm on Mac OS X that has four parts to it.

  1. go to a defined website and grab all the HTML using urllib
  2. parse the HTML data to find a table of numbers (using beautifulsoup)
  3. with those numbers do a simple calculation
  4. print out the results in a table in numerical order

I'm having trouble with step 1, i can grab the data with urllib using this code

import urllib.request
y=urllib.request.urlopen('my target website url')
x=y.read()
print(x)

But it keeps freezing once it has returned the HTML and the Python shell is non-responsive.

pythonmacosparsingurllibpython-3.4

Answers

answered 3 years ago chishaku #1

Since you mentioned requests, I think it's a great solution.

import requests
import BeautifulSoup

r = requests.get('http://example.com')
html = r.content
soup = BeautifulSoup(html)
table = soup.find("table", {"id": "targettable"})

As suggested by jonrsharpe, if you're concerned about the size of the response returned by that url, you can check the size first before printing or parsing.

With requests:

r = requests.get('http://example.com')
print r.headers['content-length']

comments powered by Disqus