[sword-devel] LANG values in sword?

Chris Little sword-devel@crosswire.org
Sat, 06 Dec 2003 17:24:55 -0600


Hugo van der Kooij wrote:
> Hi,
> 
> I know I reported that sword is not handling longer versions of the LANG 
> environment variable.
> 
> Could someone point me to the correct URL where the usage of the LANG 
> variable is defined as only two characters?

The system for assigning lang values used by Sword files was essentially 
designed by me and is more or less what we adopted for OSIS.  (There are 
a few differences that will be fixed, but they only affect minority 
languages that none of you can speak or read.)  I need to do a write up 
for assigning them, but basically the system is this:

Any language should be represented by a single unique code.
Its format should match that described by IETF RFC 3066. (So ISO 639-1 & 
ISO 639-2 codes and IANA registered codes are all valid.  Plus you can 
use SIL Ethnologue codes or LINGUIST List codes if you preface them with 
"x-SIL-" and "x-LINGUIST-" respectively.  Also, country codes are 
permissible, when they are applicable.)
Since these code systems have considerable overlap, you should choose 
the shortest code that describes the language with the greatest 
specificity.  (Hence, Ancient Greek would be "grc", not "el", which is 
Modern Greek.  And there might be instances where a group of languages 
are covered by an ISO 639-2 code, in which case a more specific SIL code 
would probably be better.)
Country codes are almost never necessary.  The only instances where they 
are relevent are between English spoken in the US, UK, etc. and between 
Chinese written in Taiwan and mainland China.

>>From my reading on 
> http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html
> 
> I can only conclude that nl_NL.UTF-8 is a valid variable and should be 
> handled by sword in such a way that it would point me to the Dutch names 
> as would nl_NL or just nl.

It's a valid variable according to some other standard, but not IETF RFC 
3066.  The format described in the page you cite is specifc to POSIX 
locales.  Our language codes are used in all books and on non-POSIX systems.

I think we're in agreement that Sword should convert POSIX locales to 
IETF format and then match the most similar available locale, which is 
why I put this feature request into our database the first time you 
brought it up.  But I don't know of anyone who has had time to work on 
it since then.  If anyone has the time, patches are always welcome.

--Chris