web scraping - Django Beautiful Soup Changing User Agent - No Effect -
i trying run webscrapping application. however, code working sites though set user agent (have tried several different ones). code work on dev site (which hosted on subdomain of pythonanywhere), not on production site. seems if blocked sites (even though have not been accessing them @ if ever). ideas? email websites , see if can granted access not doing malicious.
url = request.get['url'] import requests bs4 import beautifulsoup r = requests.get(url) soup = beautifulsoup(r.content) soup = beautifulsoup(r.content, "html.parser") if not soup.find('meta', property="og:title"): title = soup.title.string else: title = soup.find('meta', property="og:title")['content'] if "403" in title or not title: import urllib2 opener = urllib2.build_opener() opener.addheaders = [('user-agent', 'mozilla/5.0')] response = opener.open(url) page = response.read() soup = beautifulsoup(page) if not soup.find('meta', property="og:title"): title = soup.title.string else: title = soup.find('meta', property="og:title")['content']
Comments
Post a Comment