web scraping - Django Beautiful Soup Changing User Agent - No Effect -


i trying run webscrapping application. however, code working sites though set user agent (have tried several different ones). code work on dev site (which hosted on subdomain of pythonanywhere), not on production site. seems if blocked sites (even though have not been accessing them @ if ever). ideas? email websites , see if can granted access not doing malicious.

url = request.get['url']     import requests     bs4 import beautifulsoup     r = requests.get(url)     soup = beautifulsoup(r.content)     soup = beautifulsoup(r.content, "html.parser")      if not soup.find('meta', property="og:title"):         title = soup.title.string     else:         title = soup.find('meta', property="og:title")['content']      if "403" in title or not title:         import urllib2         opener = urllib2.build_opener()         opener.addheaders = [('user-agent', 'mozilla/5.0')]         response = opener.open(url)         page = response.read()         soup = beautifulsoup(page)          if not soup.find('meta', property="og:title"):             title = soup.title.string         else:             title = soup.find('meta', property="og:title")['content'] 


Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -