[osis-core] Schema: type on language

Chris Little osis-core@bibletechnologieswg.org
Sat, 11 Oct 2003 21:22:45 -0700


Patrick,

Sorry I couldn't get this out before the new Schema beta.  I've been too 
busy cursing at my cell phone every time it drops my call as I work on 
the reply.  Oh for a land line...

Patrick Durusau wrote:

> Chris,
> 
> Sanity check:
> 
> So the attributes are: ISO-639-1, ISO-639-2, SIL, LINGUIST LIST, but the 
> content of the element is your x-SIL-ENG? In other words, no regex to 
> validate the content of the <language> element?

My suggestions would be to:
1) Change "LINGUIST List" to just "LINGUIST".  "LINGUIST List" usually 
refers to just the list itself, whereas "LINGUIST" frequently refers to 
things associated with the list, such as their code list.  Anyone who is 
likely to use a LINGUIST code will recognize & understand the meaning of 
"LINGUIST".  Sorry about that, I was kind of misleading in my last reply.

2) Add "other" back to the enumeration.  I think this was a good idea. 
Or would we prefer people to name their own private schemes for language 
codes and use an "x-" type value?

3) Your question about a regex made me think... "x-SIL-ENG" was just an 
example of how SIL suggests using their codes if you need an RFC 
3066-compliant code.  I think people would expect to use just the 
Ethnologue code itself, e.g. "ENG" if they set their <language> type to 
"SIL".  So, yes, I think it should just be an xs:string, not a pattern.

4) However, in thinking about it, it did seem like it would advantageous 
to provide a mechanism for identifying codes that would be identical to 
the codes in xml:lang values in the document itself, which are RFC 
3066-compliant (in theory).  So, I would recommend we also add the 
values "IETF" (for RFC 3066, or whatever supercedes it) and "IANA" (for 
IANA registered values, such as the IETF RFCs refer to).  The contents 
of <language type="IETF"> should be constrained to RFC 
3066/xml:lang/[A-Za-z]{1,8}(\-[A-Za-z]{1,8})*, but only in prose (since 
I assume that's all that's possible if we want all other types to be 
unconstrained xs:string).

Cliff's notes version:

Change "LINGUIST List" to "LINGUIST".
Add "other", "IANA", "IETF".

Comments/objections welcome, but I think the "IETF" value would be 
invaluable down the road.

> Works for me, just wanted to check.
> 
> Assume role is just xs:string? We don't try to enumerate?

I say enumerate whenever possible.  All the values I could think of 
were: original, translation, interlinear, quotation, didactic, source & 
target.  I make no claim to those being exhaustive, but there's always x-.

--Chris