unicode - Avoid non printable character in html file written by Python -

May 15, 2015

i'm trying convert spss syntax files readable html. it's working except (single) non printable character inserted html file. doesn't seem have ascii code , looks tiny dot. , it's causing trouble.

it occurs (only) in second line of html file, corresponding first line of original file. hints @ line(s) of python cause problem (please see comments)

the code seems cause is

    rfil = open(fil,"r") #rfil =  read file, original syntax     wfil = open(txtfil,"w") #wfil =  write file, html output     #line below causes problem??     wfil.write("<ol class='code'>\n<li>")      cnt = 0     line in rfil:         if cnt == 0:             #line below causes problem??             wfil.write(line.rstrip("\n").replace("'",'&#39;').replace('"','&#34;'))          elif len(line) > 1:             wfil.write("</li>\n<li>" + line.strip("\n").replace("'",'&#39;').replace('"','&#34;'))         else:             wfil.write("<br /><br />")         cnt += 1     wfil.write("</li>\n</ol>")     wfil.close()     rfil.close()

screen shot of result

enter image description here

the input file seems begin byte order mark (bom), indicate utf-8 encoding. can decode file unicode strings opening with

import codecs rfil = codecs.open(fil, "r", "utf_8_sig")

the utf_8_sig encoding skips bom in beginning.

some programs recognize bom, don't. write file out without bom, use

wfil = codecs.open(txtfil, "w", "utf_8")

Search This Blog

DIs

unicode - Avoid non printable character in html file written by Python -

Comments

Post a Comment

Popular posts from this blog

php - cannot display multiple markers in google maps v3 from traceroute result -

css - Text drops down with smaller window -

php - Boolean search on database with 5 million rows, very slow -