[sword-devel] TEI formatting, duplicated key (BDB Glosses)

Mon Apr 30 10:27:21 MST 2012

On Mon, Apr 30, 2012 at 12:25 PM, DM Smith <dmsmith at crosswire.org> wrote:
> On 04/30/2012 10:36 AM, Jonathan Morgan wrote:
>
> Hi DM,
>
> On Tue, May 1, 2012 at 12:00 AM, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>
>> On 04/30/2012 09:37 AM, Daniel Owens wrote:
>>>
>>>
>>>
>>> On 04/30/2012 06:54 AM, Chris Little wrote:
>>>>
>>>> On 4/30/2012 4:39 AM, David Troidl wrote:
>>>>>
>>>>> Hi Chris,
>>>>>
>>>>> I'm certainly no expert on your TEI dictionaries, but wouldn't it make
>>>>> sense to have the first key be one that would sort properly, and
>>>>> present
>>>>> the dictionary in true alphabetical order? I'm thinking of Middle
>>>>> Liddell, as well as the Hebrew. This key wouldn't even necessarily have
>>>>> to be shown to the user. The second key, the title, could then maintain
>>>>> the proper accents for display, without hindering sorting, searching or
>>>>> navigation.
>>>>
>>>>
>>>> I confess, I don't understand what you're proposing this as an
>>>> alternative to.
>>>>
>>>> In the example Karl cites, there's just one actual key per entry. It is
>>>> an uppercased version of the entryFree's n attribute. This is the key that
>>>> is sorted.
>>>>
>>>> The un-uppercased version from the n attribute is being rendered as part
>>>> of the entry text via the TEI filters. This is the part I'm proposing we
>>>> retain, but render somewhere else, e.g. right-justified at the bottom of the
>>>> entry.
>>>>
>>>> We also render all the text of the entry, which in these cases includes
>>>> the text from a title element.
>>>>
>>>> I don't know what 'true alphabetical order' means, but if you mean
>>>> localized sort order, it's not possible with the current implementation of
>>>> this module type.
>>>>
>>>> --Chris
>>>>
>>>
>>> I think David's concern is something that needs to be dealt with. A
>>> number of possibilities could be pursued, some of them together:
>>>
>>>    1. The current implementation is to sort by unicode code points. This
>>> works particularly well with numeric keys. A quick solution for languages
>>> for which such sorting is not alphabetical would be to follow David's
>>> suggestion of using keys that the user does not even see. This has the
>>> advantage of providing a workable solution right away, but there are some
>>> problems with this. First, we could create a new "strongs" standard because
>>> the current implementation does not actually hide keys. That could be solved
>>> by making the keys so obscure that no one would remember them. Second, any
>>> future, more robust solution would require reworking all modules keyed to
>>> it. I have toyed with this solution, and it might be the pragmatic way
>>> forward, but it is not ideal.
>>>
>>>    2. A localized sort order, which I think this is what David means by
>>> true alphabetical order, would be a better long-term solution.
>>>
>>>    3. In addition, using genbooks for lexica would work for lexica that
>>> are sorted by root, with subentries nested in a hierarchy, just like in the
>>> Hesychius module and BDB. I have been working with Troy on this.
>>> Unfortunately, front-ends do not recognize the Feature=HebrewDef option in
>>> the conf file and allow genbooks as lexica. I can send anyone an example
>>> lexicon if you are interested in working on this. In that case, instead of
>>> @n as the key, */x-entry/@osisID would be the key.
>>>
>>> Any thoughts?
>>
>>
>> I think there is a problem with the sorting of entries in dictionaries
>> where the keys are not ascii. I don't remember the details, but I seem to
>> remember it having been discussed here.
>>
>> For JSword, we'll be building a Lucene search index for the key, the term
>> and the whole entry. A user lookup will be normalized and the search will
>> return the key with which lookup will proceed internally as it does today.
>> ICU provides the ability to create a localized sort key (not at all suitable
>> for display) that can be used to sort dictionary entries for the end-users
>> locale. I'm thinking that for TEI dictionaries the representation of the key
>> should not be shown at all.
>
>
> BPBible, and I believe some other frontends as well use binary search on the
> original module order to locate a key in a virtual list.  This provides very
> noticeable speedups on large dictionaries like ISBE.  I think this would
> require the original module creation to place a module in localised key
> order if we really wanted to order by that, not just have a lookup which as
> I understand it would only be done when actually looking for a key?  It also
> really means that a module can be sorted in one and only one way.
>
> Then again, I'm not even sure we can guarantee any kind of binary search on
> localised keys.
>
> A related issue for English dictionaries is allowing mixed-case dictionary
> keys (and I think I have heard similar comments about Greek and maybe other
> languages).  At the moment I think SWORD requires dictionary keys to be
> upper-case to ensure that they sort correctly, but really "Aaron's Rod"
> looks much better than "AARON'S ROD".  BPBible now attempts to automatically
> and heuristically turn keys to mixed case, which I think looks a lot better,
> but ideally this would be done in the same way as for other languages:
> separating sort order from codepoint order in some way.
>
>
> The idea given above is to have an index to the SWORD index. It can be built
> to be ordered and accessed in whatever way is needed to solve the problems.

Last time I checked, this is what BibleTime does - creates a cache of
the entries in a dictionary or such and updates them when it detects a
version change in the installed module. I could be wrong, but that's
how it used to work.

--Greg

>
> As you note, the problem is that SWORD makes severe assumptions about the
> order and nature of the keys. Unless care is taken uppercasing is not always
> appropriate. For example in Turkish the uppercase of 'i' is not 'I'.
>
> In Him,
>     DM
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page