Python BeautifulSoup can't select specific tag -

- June 15, 2011

my problem when parsing website , loading data tree bs. how can content of  tag? tried

for first in soup.find_all("li", class_="li-in"):     print first.select("em.fl.in-date").string                     #or      print first.select("em.fl.in-date").contents

but doesnt work. pls help.

i searching cars on tutti.ch

here entire code:

#crawl tutti.ch import urllib thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos" handle = urllib.urlopen(thisurl) html_gunk =  handle.read()  bs4 import beautifulsoup soup = beautifulsoup(html_gunk, 'html.parser')  first in soup.find_all("li", class_="li-in"):     if first.a.string , "audi" , "bmw" in first.a.string:         print "geschafft: %s" % first.a.contents         print first.select("em.fl.in-date").string     else:         print first.a.contents

when finds bmw or audi should check when car inserted. time located in em-tag this:

 heute 13:59 

 first.select("em.fl.in-date").text

assuming selector correct. didn't provide url you're scraping, can't sure.

>>> url = "http://stackoverflow.com/questions/38187213/python-beautifulsoup" >>> bs4 import beautifulsoup >>> import urllib2 >>> html = urllib2.urlopen(url).read() >>> soup = beautifulsoup(html) >>> soup.find_all("p")[0].text u'my problem when parsing website , loading data tree bs. how can content of <em> tag? tried '

after seeing code, made following change, take look:

#crawl tutti.ch import urllib thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos" handle = urllib.urlopen(thisurl) html_gunk =  handle.read()  bs4 import beautifulsoup soup = beautifulsoup(html_gunk, 'html.parser')  first in soup.find_all("li", class_="li-in"):     if first.a.string , "audi" , "bmw" in first.a.string:         print "geschafft: %s" % first.a.contents         print first.select("em.fl.in-date")[0].text     else:         print first.a.contents

Search This Blog

Prevent

Python BeautifulSoup can't select specific tag -

Comments

Post a Comment

Popular posts from this blog

github - Git errors while pushing -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

Unity3d perpendicular vector3 -