[sword-devel] support for locale codes with region/script subtags

Chris Burrell chris at burrell.me.uk
Sun Feb 10 13:26:02 MST 2013


Hi DM/Chris

The standard is defined in BCP47 which only supports a '-'. (
http://tools.ietf.org/html/bcp47)

as documented by JAVA here:
http://docs.oracle.com/javase/7/docs/api/java/util/Locale.html#def_variant.
Java seems to support both a dash and an underscore.

DM, we should ideally be using the Java functionality which supports both,
rather than implementing our own decoding scheme. Not sure what we do/don't
do here.
Chris



On 10 February 2013 20:09, DM Smith <dmsmith at crosswire.org> wrote:

> Chris,
> We've got this in JSword (not sure it works) for  a while now for the next
> release. We used the codes as you've given here. But in the conf file you
> have ur_Deva. We're not expecting an _ but a -. We can change the code.
> Please advise.
>
> In Him,
>         DM
>
> On Feb 10, 2013, at 5:56 AM, Chris Little <chrislit at crosswire.org> wrote:
>
> > Just a quick heads up:
> >
> > In general, locale codes (the Lang= field of .confs) can have subtags
> that indicate region, script, etc. Ideally these should be dealt with in
> some fashion by front ends since they identify important distinctions (in
> the eyes of the module maker or publisher at least).
> >
> > When unknown subtags are encountered, it's probably best to recursively
> fall back to the tag minus its right-most subtag. For example, if
> 'en-Latn-US' is unknown, fall back to 'en-Latn'. If that is unknown, fall
> back to 'en'. (Hopefully nearly all language subtags are known.)
> >
> > We should handle this in the library, but currently don't. :(
> >
> >
> > As a specific case in point:
> > We now have two Urdu translations. They're the same translation and
> differ in their script (one is Arabic, the other Devanagari). Their
> language codes (as of the 1.2.1 release just made, which corrected the code
> for the Devanagari version) are: ur (Urdu in Arabic script--the usual
> script for Urdu) and ur-Deva (Urdu in Devanagari script).
> >
> > Possible behaviors are to categorize the ur-Deva module as belonging to
> an unknown language (bad), to fall back and categorize it as simply Urdu
> (better, but certainly confusing if the language name is written in Arabic
> and the module is itself written in Devanagari), or to categorize it
> separately as Urdu written in Devanagari (best).
> >
> > For implementers who localize the language name, Urdu written in Arabic
> is written "اردو". Urdu written in Devanagari is written "उर्दू".
> >
> > --Chris
> >
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20130210/846cd3ac/attachment.html>


More information about the sword-devel mailing list