[osis-core] Schema: type on language

Chris Little osis-core@bibletechnologieswg.org
Sun, 19 Oct 2003 01:25:22 -0700 (MST)


Todd,

For one, it's questionable whether we can really say any language can be 
unambiguously identified.  But let's suppose we really know what English 
is and we really know that 'en' identifies it.  ISO 639 does a better job 
of unambiguously identifying some languages than it does for others.  
There are a bunch of codes that describe groups of codes, such as "Native 
America Indian" and "Austronesian (Other)".

So, it's not quite true that Javanese has no ISO code, it's just a very, 
very ambiguous code shared with hundreds of other langauges.  (The code 
would be 'map' -- "Austronesian (Other)".)

I think it is valuable to keep type="...", since some organizations use 
those codes themselves for various sorting purposes (e.g. the Library of 
Congress uses ISO 639-2/B and SIL uses Ethnologue codes).  If they need to 
use such data, I think we should provide a place to hold it.

But for interoperability, IETF/xml:lang is probably best.

What are your thoughts on also adding "English", "French", & "native" to 
the types enumeration.  Is that unnecessary/inappropriate?


--Chris


On Fri, 17 Oct 2003, Todd Tillinghast wrote:

> Chris,
> 
> If there is a way to unambiguously express ALL of the various language
> values using xml:lang in a IETF compliant string then it would seem to
> make sense to use that same structure for the value of <language> and
> for xml:lang AND not have a type="..." set of enumerated types.
> 
> Ex: 
> Javanese for which there is not ISO code:
> <osisText xml:lang="x-SIL-JVN">
> and 
> <work>
>    <language>x-SIL-JVN</language>
> </work>
> 
> Albanian:
> <osisText xml:lang="sq">
> and 
> <work>
>    <language>sq</language>
>    <language>x-ISO-639-1-sq</language>
>    <language>x-ISO-639-2-T-sqi</language>
>    <language>x-ISO-639-2-B-alb</language>
>    <language>x-SIL-ALS</language>
> </work>
> 
> This would keep the xml:lang and <language> values consistent.  It would
> seem that we will have to enumerate the "x-" alternatives for xml:lang
> in the documentation so we might as well use the same structure both
> places.  
> 
> I believe that "x-" is allowed in the w3c's xml.xsd schema so the above
> options should work.  (Naturally if there is already an established
> syntax for ISO values within xml:lang we should use it rather than my x-
> values above.)