Python HTML Parsing with BS4

ThatsOk Source

I'm having the problem of trying to parse through HTML using Python & Beautiful Soup and I'm encountering the problem of which I want to parse for a very specific piece of data. This is the kind of code I'm encountering:

<div class="big_div">
   <div class="smaller div">
      <div class="other div">
         <div class="this">A</div>
         <div class="that">2213</div>
      <div class="other div">
         <div class="this">B</div>
         <div class="that">215</div>
      <div class="other div">
         <div class="this">C</div>
         <div class="that">253</div>

There is a series of repeat HTML as you can see with only the values being different, my problem is locating a specific value. I want to locate the 253 in the last div. I would appreciate any help as this is a recurring problem in parsing through HTML.

Thank you in advance!

So far I've tried to parse for it but because the names are the same I have no idea how to navigate through it. I've tried using the for loop too but made little to no progress at all.



answered 3 months ago Krushi Raj #1

You can use string attribute as argument in find. BS docs for string attr.

"""Suppose html is the object holding html code of your web page that you want to scrape
and req_text is some text that you want to find"""
soup = BeautifulSoup(html, 'lxml')
req_div = soup.find('div', string=req_text)

req_div will contain the div element which you want.

comments powered by Disqus