Python BeautifulSoup can't select specific tag -
my problem when parsing website , loading data tree bs. how can content of <em>
tag? tried
for first in soup.find_all("li", class_="li-in"): print first.select("em.fl.in-date").string #or print first.select("em.fl.in-date").contents
but doesnt work. pls help.
i searching cars on tutti.ch
here entire code:
#crawl tutti.ch import urllib thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos" handle = urllib.urlopen(thisurl) html_gunk = handle.read() bs4 import beautifulsoup soup = beautifulsoup(html_gunk, 'html.parser') first in soup.find_all("li", class_="li-in"): if first.a.string , "audi" , "bmw" in first.a.string: print "geschafft: %s" % first.a.contents print first.select("em.fl.in-date").string else: print first.a.contents
when finds bmw or audi should check when car inserted. time located in em-tag this:
<em class="fl in-date"> heute <br></br> 13:59 </em>
first.select("em.fl.in-date").text
assuming selector correct. didn't provide url you're scraping, can't sure.
>>> url = "http://stackoverflow.com/questions/38187213/python-beautifulsoup" >>> bs4 import beautifulsoup >>> import urllib2 >>> html = urllib2.urlopen(url).read() >>> soup = beautifulsoup(html) >>> soup.find_all("p")[0].text u'my problem when parsing website , loading data tree bs. how can content of <em> tag? tried '
after seeing code, made following change, take look:
#crawl tutti.ch import urllib thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos" handle = urllib.urlopen(thisurl) html_gunk = handle.read() bs4 import beautifulsoup soup = beautifulsoup(html_gunk, 'html.parser') first in soup.find_all("li", class_="li-in"): if first.a.string , "audi" , "bmw" in first.a.string: print "geschafft: %s" % first.a.contents print first.select("em.fl.in-date")[0].text else: print first.a.contents
Comments
Post a Comment