python - Open a webpage and return a dictionary of links on that page -
i wanted write function opens webpage , returns dictionary of links , text on page. tried it's giving me error. can do?
def process(url): myopener = myopener() #page = urllib.urlopen(url) page = myopener.open(url) text = page.read() page.close()
example input
<a href='http://my.computer .com/some/file.html'>link text</a>
output
{"http://my.computer.com/some/file.html":link text.."}
welcome stack overflow,
you haven't shown myopener
does, have used own. code uses python 3 , beautiful soup 4 html parser (a personal favorite) on python wikipedia article.
root_url = "https://en.wikipedia.org" html_string = retrieve_webage(root_url + "/wiki/python_%28programming_language%29") soup = beautifulsoup(html_string) output = {} # can redefine soup here parse part of page link in soup.find_all('a'): linkhref = link.get('href') if not linkhref: # ingnore blank hyperlinks pass elif linkhref[0] == '/': # add root url relitive links linkhref = root_url + linkhref output[linkhref] = link.text
this script overwrite links identical href
attributes reads them down page. can learn more beautiful soup here.
feel free comment below if have questions
Comments
Post a Comment