python - Open a webpage and return a dictionary of links on that page -


i wanted write function opens webpage , returns dictionary of links , text on page. tried it's giving me error. can do?

def process(url):     myopener = myopener()     #page = urllib.urlopen(url)     page = myopener.open(url)      text = page.read()     page.close() 

example input

<a href='http://my.computer .com/some/file.html'>link text</a> 

output

{"http://my.computer.com/some/file.html":link text.."} 

welcome stack overflow,

you haven't shown myopener does, have used own. code uses python 3 , beautiful soup 4 html parser (a personal favorite) on python wikipedia article.

root_url = "https://en.wikipedia.org" html_string = retrieve_webage(root_url + "/wiki/python_%28programming_language%29") soup = beautifulsoup(html_string) output = {} # can redefine soup here parse part of page link in soup.find_all('a'):     linkhref = link.get('href')     if not linkhref:         # ingnore blank hyperlinks         pass     elif linkhref[0] == '/':         # add root url relitive links         linkhref = root_url + linkhref     output[linkhref] = link.text 

this script overwrite links identical href attributes reads them down page. can learn more beautiful soup here.

feel free comment below if have questions


Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -