[sword-devel] Yiddish New Testament available as a free PDF

fred smith fredex at fcshome.stoneham.ma.us
Thu Jun 26 15:17:35 MST 2008


On Thu, Jun 26, 2008 at 05:03:23PM +0100, Peter von Kaehne wrote:
> 
> > Can anyone in the forum read Yiddish?
> 
> No,  but I would probably understand (so-so) it if it was read to me. I
> probably also could make sense if it was latin transliterated.
> 
> There is though a problem with PDFs - I know of no way of scraping a non
> ASCII PDF. usually "copy" turns up garbage.

I've had modest success with PDF files that contained scanned images
of documents by using GhostScript to print to a tif file then processing
that via the Tesseract scanner engine. It does a fairly good job, but
unfortunately the results still need quite a bit of cleaning up. Worst
problem is that Tesseract doesn't know about page formatting, it just
outputs text in whatever order it sees it. I just this week learned of
a tool called "unpaper" which can do a lot of cleanups on pages between
the cration of the image and the OCR process, but haven't yet tried it.

-- 
---- Fred Smith -- fredex at fcshome.stoneham.ma.us -----------------------------
               But God demonstrates his own love for us in this: 
                         While we were still sinners, 
                              Christ died for us.
------------------------------- Romans 5:8 (niv) ------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.crosswire.org/pipermail/sword-devel/attachments/20080626/6dc7b2a5/attachment.bin 


More information about the sword-devel mailing list