Python BeautifulSoup can't select specific tag -


my problem when parsing website , loading data tree bs. how can content of <em> tag? tried

for first in soup.find_all("li", class_="li-in"):     print first.select("em.fl.in-date").string                     #or      print first.select("em.fl.in-date").contents 

but doesnt work. pls help.

i searching cars on tutti.ch

here entire code:

#crawl tutti.ch import urllib thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos" handle = urllib.urlopen(thisurl) html_gunk =  handle.read()  bs4 import beautifulsoup soup = beautifulsoup(html_gunk, 'html.parser')  first in soup.find_all("li", class_="li-in"):     if first.a.string , "audi" , "bmw" in first.a.string:         print "geschafft: %s" % first.a.contents         print first.select("em.fl.in-date").string     else:         print first.a.contents 

when finds bmw or audi should check when car inserted. time located in em-tag this:

<em class="fl in-date"> heute <br></br> 13:59 </em>

 first.select("em.fl.in-date").text 

assuming selector correct. didn't provide url you're scraping, can't sure.

>>> url = "http://stackoverflow.com/questions/38187213/python-beautifulsoup" >>> bs4 import beautifulsoup >>> import urllib2 >>> html = urllib2.urlopen(url).read() >>> soup = beautifulsoup(html) >>> soup.find_all("p")[0].text u'my problem when parsing website , loading data tree bs. how can content of <em> tag? tried ' 

after seeing code, made following change, take look:

#crawl tutti.ch import urllib thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos" handle = urllib.urlopen(thisurl) html_gunk =  handle.read()  bs4 import beautifulsoup soup = beautifulsoup(html_gunk, 'html.parser')  first in soup.find_all("li", class_="li-in"):     if first.a.string , "audi" , "bmw" in first.a.string:         print "geschafft: %s" % first.a.contents         print first.select("em.fl.in-date")[0].text     else:         print first.a.contents 

Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -