Matěj Cepl mcepl at redhat.com
Wed Jan 12 03:06:00 MST 2011


I have here the original XML texts for CzeKMS
(http://crosswire.org/sword/modules/ModInfo.jsp?modName=CzeKMS is from
earlier drafts and without very extensive footnotes IIUC) and I was
promised from the copyright holders to be able to release them under a
free license (details are under negotiation ATM so I won't point out to
the current repository of texts; the current state of script is
attached). I have now XSLT script which translates original non-standard
XML to OSIS which validates against XSD schema (osisCore.2.1.1.xsd).

I don't when (if ever) copyright holders switch to OSIS as to their
canonical format, so I want to limit myself to machine translation using
just the script, but trying to preserve (and properly annotate with OSIS
elements) as much information from the original source as possible
(notes, variants, cross-references, etc.) Certainly that opens a lot of
opportunities for possible later improvements via hand-editing, but that
will have to wait I guess.

However, many questions remain, so I would like to ask here for a bit of

 * First about licence: are there any limitations on which license is
encouraged to use / allowed (when it is not public domain that is). Are
Creative Commons normal licenses for biblical texts (I am suggesting
CC-BY-SA 3.0 CZ, but would be there any problems if we went all the way
to CC-BY-ND-NC)?
 * Next, about notes ... aside from textual notes and cross-references
which are easy (although I will have to mark all textual notes as
type="study", because I cannot distinguish by the script what's
allusion, alternative or other types of note). However, there are also
slightly less than hundred types of (mostly milestoned) elements (either
separate elements or in the end authors of the DTD receded to the
universal element <index/> with attribute n distinguishing different
marks in the text ... either to mark different verbal forms
untranslatable to Czech, or with marks distinguishing different
Hebrew/Greek words underlying the translation (soma/sarx give same word
"tělo" in Czech, chronos/kairos is another couple). So, the questions are:
  - could I mark Strong's numbers just for some (few) Greek/Hebrew words
and not for others? Is it same with morphological/syntax marks? Could I
mark just some verbal constructs and not others? Is it correct OSIS?
Won't break it sword (and its frontends)?
  - is <w/> milestoneable?
  - I couldn't find how to mark various levels of originality of the
text (all those "majority/minority original text doesn't contain this
text"). What's the proper way?
 * Is it better for conversion to Sword .mod files to generate one huge
bible.xml file or separate files for each biblical book?

Thanks for any advice.


Matěj Cepl

