[sword-devel] Proper sorting of pointed Hebrew

Aaron Christianson ninjaaron at gmail.com
Sat Jan 16 00:56:54 MST 2016


In a previous thread (something about sorting various languages), I
mentioned that there is no proper way to sort vocalized Hebrew using
either Unicode points or local-based collation strings. This is
because each vowel and accent is its own code point, but they are not
considered to be "letters", and therefore do not factor into the
correct sort order. (Unvocalized Hebrew, on the other hand, is sorted
correctly by both code point and locale-based sorting)

I mentioned to David Haslam that I had written a python script that
handles this situation correctly, asking if I should share it, and he
suggested that I should, so I've documented it and refined it a little
and put it on github. https://github.com/ninjaaron/ivsort.py

While the vowels are not "officially" part of the sorting order, there
are many Hebrew words which are identical except for vowels, so they
are used to create a secondary order for sorting. This order varies
among the different sources, so I asked for a recommendation about
this order from the Academy for the Hebrew Language in Israel, which
they gave me. Here is a translation of our correspondence:

> Sorry for my Hebrew! (I'm a gentile from the US that studied at HU.) I
> wrote because I'm writing a program that's able to put Hebrew words
> with niqqud in order according to the lexical order — according to the
> consonants first and then the vowels if needed. However, I'm not sure
> which order to use for the vowels. In BDB and HALOT, it's not clear to
> me what the order is, and it it doesn't seem to me as if it always
> depends on the niqqud (rather it appears that it's related to a
> morphological classification). I looked for other orders, and I found
> everything and nothing. The closest thing I found to a standard is the
> unicode order which is also found on a the Hebrew keyboard and so also
> on Wikipedia in English. I also saw this order in several places:
> qamets, patach, segol, tsere, qibbuts, holem. I'm interested to know
> what the Academy recommends. Thanks!

and their response:

> It's difficult to say that there's a canonical order. There isn't an
> obvious order in Even Shoshan either. In the production of the
> Historical Dictionary, this order is followed:
>
> [a list of hebrew vowels, same as the one below]
>
> By coincidence or not, this order is adopted both by Microsoft and
> Unicode (codes 05B0-05BC; See the attached letter [my letter, I
> believe]). And really, it's no coincidence that the print-dictionary,
> "Millon Hahoveh" uses this order, and so
> it's written at the end of the introduction (p. 8):
>
> "For words which are the same in their written form, this order of
> precedence for vowels is followed: schwa, hataf segol, hataf patach,
> hataf qamets, hireq, tsere, segol, patach, cholem, qibbuts, shureq.
> For words which are also the same in their Niqqud, Shin preceeds Sin."
>
> Take note that the dagesh does not preform a role in classification,
> but we did come across a case where it was the single differentiating
> between words of Hebrew origin (כִּי for it's directives) and a loan-word (kai, the
> name of the Greek letter)
>
> With Blessing, Ronit Gadish

(original Hebrew available upon request. There were a couple bits I
didn't fully understand myself.)

I have followed their advice for vowels, though I have followed the
general academic standard about Sin and Shin in Biblical Hebrew
lexicons, which is contrary to the order she mentions in one of the
Modern Hebrew dictionaries (Millon Hahoveh). I have discussed my
reasons for this in the documentation of the script.

Also note that some parts of the script may be a bit difficult to
understand unless you're familiar with the finer points of Hebrew
vocalization. Feel free to email me with any questions.



More information about the sword-devel mailing list