unicode - Avoid non printable character in html file written by Python -


i'm trying convert spss syntax files readable html. it's working except (single) non printable character inserted html file. doesn't seem have ascii code , looks tiny dot. , it's causing trouble.

it occurs (only) in second line of html file, corresponding first line of original file. hints @ line(s) of python cause problem (please see comments)

the code seems cause is

    rfil = open(fil,"r") #rfil =  read file, original syntax     wfil = open(txtfil,"w") #wfil =  write file, html output     #line below causes problem??     wfil.write("<ol class='code'>\n<li>")      cnt = 0     line in rfil:         if cnt == 0:             #line below causes problem??             wfil.write(line.rstrip("\n").replace("'",'&#39;').replace('"','&#34;'))          elif len(line) > 1:             wfil.write("</li>\n<li>" + line.strip("\n").replace("'",'&#39;').replace('"','&#34;'))         else:             wfil.write("<br /><br />")         cnt += 1     wfil.write("</li>\n</ol>")     wfil.close()     rfil.close() 

screen shot of result

enter image description here

the input file seems begin byte order mark (bom), indicate utf-8 encoding. can decode file unicode strings opening with

import codecs rfil = codecs.open(fil, "r", "utf_8_sig") 

the utf_8_sig encoding skips bom in beginning.

some programs recognize bom, don't. write file out without bom, use

wfil = codecs.open(txtfil, "w", "utf_8") 

Comments

Popular posts from this blog

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -

Trying to import CSV file to a SQL Server database using asp.net and c# - can't find what I'm missing -