[sword-devel] Re: [sword-support] An awesome Bible

Leon Brooks sword-devel@crosswire.org
Sun, 4 May 2003 13:13:10 +0800


On Fri, 2 May 2003 10:24, Keith Ralston wrote:
> PDF has a plain text version imbedded.  You can use the PDF API to
> extract text from the documents.

Not exactly true. PDF is just a fancified version of PostScript. YOu 
might think that this makes it easy to extract the text, but in real 
life many PDF generators do amazingly silly things like storing the 
text as a long list of decimal numbers instead of as raw strings.

pdftotext may do the conversion you're looking for, but it entirely 
depends on how silly the PDF creation software was.

In short: worth a try, but don't bet your life on it.

Cheers; Leon