encoding - Decoding UTF-8 to URL with Python -


i have following url encoded in utf-8.

url_input = u'https://www.gumtree.com//p/uk-holiday-rentals/1bedroon-flat-\xa3250pw-all-bills-included-/1174092955' 

i need scrap webpage , need have following url_output (unicode not read).

url_output=https://www.gumtree.com//p/uk-holiday-rentals/1bedroon-flat-£250pw-all-bills-included-/1174092955 

when print url_input, url_output:

print(url_input) https://www.gumtree.com//p/uk-holiday-rentals/1bedroon-flat-£250pw-all-bills-included-/1174092955 

however not find way transform url_input url_output. according forums print function uses ascii decoding on python 2.7 ascii not supposed read \xa3 , url_input.encode('ascii') not work.

does know how can solve problem ? in advance !

after tests, can confirm server accepts url in different formats:

  • raw utf8 encoded url:

    url_output = url_input.encode('utf8') 
  • %encoded latin1 url

    url_output = urllib.quote_plus(url_input.encode('latin1'), '/:') 
  • %encoded utf8 url

    url_output = urllib.quote_plus(url_input.encode('utf8'), '/:') 

as raw latin1 in not accepted , leads incorrect url error, , passing non ascii characters in url may not safe, advice use third way. gives:

    print url_output      https://www.gumtree.com//p/uk-holiday-rentals/1bedroon-flat-%c2%a3250pw-all-bills-included-/1174092955 

Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -