[osis-users] Validation related OSIS questions

Chris Little chrislit at crosswire.org
Thu Nov 8 20:47:09 MST 2012


Hi Markku,

On 11/8/2012 5:08 AM, Markku Pihlaja wrote:
>
> Our project, the Finnish OSIS Bible, seems to be very much an on-off
> project. Now we're back in the "on" state and getting quite close to the
> finish line. A few questions again, this time mostly related to validity
> / validation.
>
>
> 1) What XML validators (online or for Windows, preferably free) are you
> using for OSIS? I've used http://www.validome.org/xml/validate/, but it
> chokes on larger files - and my bible.xml is about 8 Mb :). If I split
> the file in smaller chunks, it does work.

I use Oxygen XML myself, which uses Xerces for validation. You should 
also be able to use Xerces directly, if you want to avoid licensing 
costs for one of the commercial XML editors. I have also had good 
success with xmllint, and I know others have used Sun's Multi-Schema 
Validator.

> 2) The Durusau OSIS User Manual doesn't give any directions for
> specifying the doctype or charset of the documents. And at least the
> w3.org <http://w3.org> validator refuses to validate the file without
> them. How do I do that?

The encoding is usually indicated with a line like:
<?xml version="1.0" encoding="UTF-8"?>

The doctype shouldn't be necessary since you'll generally want to 
indicate the schema itself, but I think you can add a doctype 
declaration like the following if you want to:
<!DOCTYPE osis>

This won't help you to use the w3 validator since that's just for HTML, 
XHTML, & other web format (unless there's a validator I haven't found).

> 3) How should I code en or em dashes in OSIS? The (for an HTML expert)
> obvious solutions, &ndash; and &mdash; seem to be HTML specific and
> invalid in XML. Or at least I this get error message from the validator:
> "Entity 'ndash' was referenced, but not declared"
> I'd like to be able to use some code or entity instead of an actual dash
> characters (– or —), at least in some places, since we have two
> different semantics for the dashes and I'd like to keep them separate in
> the code.

If you don't want to encode the characters as Unicode, you can use 
&#x2013; for the en dash and &#x2014; for em dash. I believe you could 
also declare your own entities in the DTD:

<!DOCTYPE osis [
	<!ENTITY ndash   "&#x2013;">
	<!ENTITY mdash   "&#x2014;">
]>

> 4) Finally, a question not related to validation. In our translation,
> there are two paragraphs that span over chapter borders. In those
> places, the translation committee requires an inline chapter number
> instead of one that starts a new line (or paragraph). Obviously, this
> can't be handled by regular OSIS, since it would result in something
> like this:
>
> <chapter>
>    ...
>    <p>
>    ...
> </chapter>
>
> <chapter>
>    ...
>    </p>
>    ...
> </chapter>
>
> which is of course invalid.
>
> How would you suggest that an exception like this should be coded? Add
> some custom type attribute value to indicate special handling in layout?

This was exactly the case for which <chapter> was made milestonable. You 
can switch all of your chapter elements to milestones:

<chapter osisID="Rev.12" sID="Rev.12"/>
...
<p>
...
<chapter eID="Rev.12"/>
<chapter osisID="Rev.13" sID="Rev.13"/>
...
</p>
...
<chapter eID="Rev.13"/>


--Chris




More information about the osis-users mailing list