Python line.replace returns UnicodeEncodeError -
i have tex file generated rst source using sphinx, encoded utf-8 without bom (according notepad++) , named final_report.tex
, following content:
% generated sphinx. \documentclass[letterpaper,11pt,english]{sphinxmanual} \usepackage[utf8]{inputenc} \begin{document} \chapter{preface} krimson4 nice programming language. umlauts äöüßÅö. “double quotation mark” problem. johnny’s apostrophe allows connecting multiple ports. components include data describe how ellipsis … software interoperability – dash – not ok. \end{document}
now, before compile tex source pdf, want replace lines in tex file nicer results. script inspired another question.
#!/usr/bin/python # -*- coding: utf-8 -*- import os newfil=os.path.join("build", "latex", "final_report.tex-new") oldfil=os.path.join("build", "latex", "final_report.tex") def freplace(old, new): open(newfil, "wt", encoding="utf-8") fout: open(oldfil, "rt", encoding="utf-8") fin: line in fin: print(line) fout.write(line.replace(old, new)) os.remove(oldfil) os.rename(newfil, oldfil) freplace('\documentclass[letterpaper,11pt,english]{sphinxmanual}', '\documentclass[letterpaper, 11pt, english]{book}')
this works on ubuntu 16.04 python 2.7 python 3.5, fails on windows python 3.4. error message is:
file "c:\python34\lib\encodings\cp850.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] unicodeencodeerror: 'charmap' codec can't encode character '\u201c' in position 11: character maps <undefined>
where 201c
stands left double quotation mark. if remove problematic character, script proceeds till finds next problematic character.
in end, need solution works on linux , windows python 2.7 , 3.x. tried quite lot of solutions suggested here on so, not yet find 1 works me...
you need specify correct encoding encoding="the_encoding"
:
with open(oldfil, "rt", encoding="utf-8") fin, open(newfil, "wt", encoding="utf-8") fout:
if don't preferred encoding used.
in text mode, if encoding not specified encoding used platform dependent: locale.getpreferredencoding(false) called current locale encoding
Comments
Post a Comment