[sword-devel] Portuguese translation - request for help
Peter von Kaehne
refdoc at gmx.net
Thu Sep 9 15:52:31 MST 2010
Several questions re XSLT:
How can I select on (bits of/features of) content of the actual text
node - e.g. the text being capitalised?
How can I print out an attribute?
How can I avoid the text content being printed while still working on
the children nodes?
I think i have made decent progress on this text, but there were long
periods of inactivity until I understood the next steps.
FWIW my process:
I have received PDF files, no better source exists. I have used pdf2xml
to create XML expressions of the underlying post script. I then have
fairly painstakingly analysed the font size and other characteristics to
decide which bit represents which structure.
A perl script produces now a xml file based on above.
This XML file is still ordered along pages and as a print layout,
without any deeper hierarchy, so no actual textual structure. But at
least the structure becomes perceivable in my naming of tags.
I then take an XSLT sheet to create USFM from the text. This is closer
to the structureless text than OSIS.
I finally need another Perl script to clean things up a bit. (not yet
written), but that will be straight forward.
More information about the sword-devel