[sword-devel] Project Gutenberg Etexts

Steve Tang sword-devel@crosswire.org
Tue, 25 Jun 2002 06:14:57 -0600 (MDT)

> The problem with Project Gutenberg is that all the books are in plain 
> ASCII, with NO markup.  So you will need to insert paragraph breaks, 
> minimally.  You may wish to insert scripture reference tags, if present.  
> And many pieces of markup like emphasized text have been lost thanks to 
> Project Gutenberg.

Parsing natural language is difficult, certainly beyond our reach for the
moment. But just parsing chapters, or paragraphs should be relatively
straight forward and therefore perl-able.

> There's no (reasonable) possibilty of an automatic converter like thml2gbs 
> for Gutenberg works since they lack markup and any kind of organization.
> Good luck though!  I'm sure CCEL would be happy to take books like City of 
> God if you put them into a good XML format.
> --Chris

I thought Sword has 'general book tool' but I don't know how to use it.

Steve Tang...