[sword-devel] EMTV text source URL is now unrelated

David Haslam dfhmch at googlemail.com
Wed Oct 12 12:18:23 MST 2011


Hi Troy,

Yes - you're probably right about lack of a readily available tool for
direct conversion.

Had I been tackling the task, I might have considered these steps:

1. Open each HTML file using MS Word, save each file as RTF.
2. Open each RTF file using WordPad, save again as RTF (smaller and simpler
file structure).
3. Create & run a script to process the RTF tags for italics attribute and
for red font colour.
4. Open the processed RTF files using WordPad, save as Unicode text 
(encoded as UTF-16 LE).
5. Use a suitable editor to open the Unicode text files and change encoding
to UTF-8 (without BOM).

After step 5 you'd have something similar to where you began converting
plain text to OSIS, but with some ingenuity at step 3, you'd also have some
elementary markup for italics and red letters that survives the complete
loss of formating attributes at step 4.

During my Go Bible activities, I've used this approach more times than I can
recall.

/The steepest part of the learning curve is getting used to the format of
RTF files when viewed by an ordinary text editor/.

After step 5, it's often simpler to do the next conversion to USFM, and then
use usfm2osis.pl 

Best regards,
David




--
View this message in context: http://sword-dev.350566.n4.nabble.com/EMTV-text-source-URL-is-now-unrelated-tp3871411p3899264.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list