Is it possible to read Word files (.doc/.docx) in Python -
i want create validation tool;
can 1 me read .doc/.docx documents in python in order search , compare file contents.
yes possible. libreoffice (at least) has command line option convert files works treat. use convert file text. load text file python per routine manoeuvres.
this worked me on libreoffice 4.2 / linux:
soffice --headless --convert-to txt:text /path_to/document_to_convert.doc
i've tried few methods (including odt2txt, antiword, zipfile, lpod, uno). above soffice command first worked , without error. this question on using filters soffice
on ask.libreoffice.org helped me.
Comments
Post a Comment