[sword-devel] Adding abbreviated names to the module conf file (was Re: isalnum(3) for i18n)

Wed Dec 17 22:15:04 MST 2008

DM Smith wrote:
> Chris,
> 
> I would like to lobby for a separator between the language code and the 
> field name. I don't much care whether it is a prefix or suffix. While I 
> understand that you are suggesting that we don't have a deAbbr or 
> xxAbbr, I could see that it might be added some time in the future and 
> with 3-letter codes and with differences in script (e.g. 
> Traditional/Simplified Chinese), a separator makes it much easier to 
> code for today.

Okay. While I find the arguments in favor of localizing all string 
fields in .confs entirely unconvincing, it doesn't drastically harm 
anything to permit module makers to add more data to their .conf files.

So I think we can go with the suggestion of suffixing localized string 
values in .confs with _ plus a locale (which would generally mean a BCP 
47 value, but we may have to alter that based on the constraints on 
attribute names in .confs). If we add Abbr, Author, Translator, & 
Publisher fields, the complete list of localizable attributes would be:

Abbr, Author, Translator, Publisher,
About, Description, History_x.x,
Copyright, CopyrightHolder, CopyrightNotes, CopyrightContactName, 
CopyrightContactNotes, CopyrightContactAddress, ShortPromo, 
ShortCopyright, DistributionNotes

In the absence of a localized form, the un-suffixed version will always 
be default. In general, the title-like attributes will be according to 
the actual title of the book, generally in the language of the text. 
Names of people/organizations will generally be language-neutral. The 
rest will generally be language-neutral or in English.

That said, nothing will *work* unless someone writes the code to take 
advantage of it.

> I like that the default should be in the language of the module. I'm 
> assuming as well in the encoding of the module (e.g. UTF-8 for UTF-8 
> modules).

Yes, I mentioned the latter point in an aside. We already have the 
standard of interpreting fields in a module with Encoding=UTF-8 as 
UTF-8--otherwise they should be interpreted as cp1252. No need to change 
that.

> WRT the length of Abbr, I'd like to see it be much shorter than 16 or 
> that 16 be the upper limit w/ a much smaller number being the 
> recommended maximum, say 6?, with the knowledge that anything longer 
> than 6 (or whatever is the recommended max) may be truncated by some 
> frontends (e.g. MacSword and BibleDesktop have dropdowns for a parallel 
> view which have a severe limit of 4. I imagine that small devices, such 
> as phones and PDAs would also have a real estate problem.)

So, for reference, the width of 16 characters would be:
xxxxxxxxxxxxxxxx
Six would be:
xxxxxx

I suspected there would be disagreement with my suggested number, but I 
had assumed that it would seem too low. So... some of my reasoning:

Many Bibles will include a year, which eats up 4 characters in itself.
Bibles with standard abbreviations aren't a big issue (WEB, NIV, NASB, 
NRSV, etc.) but many others incorporate a translator/place/organization 
name--which can be longish (Elberfelder, Webster, Grünewald, Rotherham, 
Delitzsch, Tischendorf, Cornilescu, etc.)

So, we could make the limit lower, but I worry that we would limit the 
meaningfulness of these strings. Maybe we could cut it down to 12?:
xxxxxxxxxxxx

I18n isn't much of a concern here. Western European languages have the 
highest sign to phoneme ratio that I can think of. And non-alphabetic 
scripts will generally be far more economical in terms of 
codepoints--though this will often be lost due to physically wider 
characters.

--Chris