[sword-devel] Converting files

dnr at cscholar.com dnr at cscholar.com
Fri Nov 26 11:52:05 MST 2004


It depends on the structure of the html files, if the books are split into
different html files then it probably would not be a lot of trouble to write
a small program to do the work of creating a thml file, and then it could be
converted to other formats. The html files would have to be named in a
predictable way so that they could be processed in proper order.

Something like 1_1.htm 1_2.htm 2_1.htm 2_2.htm, this would work for
something that has divisions like a Bible, where the first number represents
the book, and the second number represents the chapter.

Then just output an thml header, followed by a dublin core, then read in the
html files, strip the html header, and footers.
and write the body text of the files with in the correct division elements.
<div1 title="">
Text
<div2 title="">
Text
</div2>
<div2 title="">
Text
</div2>
</div1>

For Bibles there need to be some id's generated for books, chapters, verses,
but this should not be hard to do if the files are named in a structured way
the id's can be determined from the file names, and from numbers which I
would expect are placed before each verse in the html files.

If the divisions of the books are not in different html files then you would
need to look at the html there will probably be something in the html that
can be used as 'landmarks' for recognizing when a division starts.

If needed I could write the program for you, but I need more information on
the structure of the books. It may be a few days before I could get to it.

DNR

----- Original Message ----- 
From: <chrislit at crosswire.org>
To: "SWORD Developers' Collaboration Forum" <sword-devel at crosswire.org>
Sent: Friday, November 26, 2004 6:10 AM
Subject: Re: [sword-devel] Converting files


> There is no "easy" way to convert HTML to GBF, ThML, or OSIS in a useful
> and meaningful way. HTML is presentational markup. GBF & OSIS deal with
> structural markup. ThML mixes the two, but to use it for module import
> would require some structural markup.
>
> It is impossible to generalize a method for converting HTML to any
> structural markup language.
>
> --Chris
>
>
> On Wed, 24 Nov 2004, Luiz Augusto Pereira Fernandes wrote:
>
> > Please, anyone knows any easy mode for converting
> > .html files to gbf, thml or osis? I need it for
> > converting some copyright files for my personal usage.
> >
> > The informations in
> > http://www.crosswire.org/sword/develop/swordmodule/
> > is very difficult...
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > sword-devel mailing list
> > sword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> >
> _______________________________________________
> sword-devel mailing list
> sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
>
>



More information about the sword-devel mailing list