encoding - Decoding UTF-8 to URL with Python -
i have following url encoded in utf-8.
url_input = u'https://www.gumtree.com//p/uk-holiday-rentals/1bedroon-flat-\xa3250pw-all-bills-included-/1174092955'
i need scrap webpage , need have following url_output (unicode not read).
url_output=https://www.gumtree.com//p/uk-holiday-rentals/1bedroon-flat-£250pw-all-bills-included-/1174092955
when print url_input, url_output:
print(url_input) https://www.gumtree.com//p/uk-holiday-rentals/1bedroon-flat-£250pw-all-bills-included-/1174092955
however not find way transform url_input url_output. according forums print function uses ascii decoding on python 2.7 ascii not supposed read \xa3
, url_input.encode('ascii')
not work.
does know how can solve problem ? in advance !
after tests, can confirm server accepts url in different formats:
raw utf8 encoded url:
url_output = url_input.encode('utf8')
%encoded latin1 url
url_output = urllib.quote_plus(url_input.encode('latin1'), '/:')
%encoded utf8 url
url_output = urllib.quote_plus(url_input.encode('utf8'), '/:')
as raw latin1 in not accepted , leads incorrect url error, , passing non ascii characters in url may not safe, advice use third way. gives:
print url_output https://www.gumtree.com//p/uk-holiday-rentals/1bedroon-flat-%c2%a3250pw-all-bills-included-/1174092955
Comments
Post a Comment