[sword-devel] Proper sorting of pointed Hebrew

Aaron Christianson ninjaaron at gmail.com
Mon Jan 18 02:31:35 MST 2016


Hey David,

It depends on how it's implemented on how useful it is. Most text editors use normalization for display, but the characters are still stored as they were input, so it's really only useful for printing or exporting to a format like PDF. However, if Babel Pad has a way to apply the normalization permanently to Hebrew, and if it is better than the canonical Unicode composition, that might be useful for pre-formatting sword modules before import.

However, I'm a Linux user and can't test Babel Pad myself, but I had already been toying with the idea of tackling this problem with a python script, since I've already done half the work for it in the sorting script. Given my preference in text editors and operating systems, I'm more inclined to prefer a tool that can read from stdin and write to stdout or be imported as a python module. And, I hope this doesn't come across the wrong way, but this problem could probably benefit from having a Hebrew teacher working on it directly. I'm not a very experienced programmer, but I do understand the Hebrew writing system quite well.

Unfortunately, while it may slightly improve the sorting situation under the Hebrew locale (and make it much worse with UTF-8 code points!), the fundamental problem that most of the vowels and all of the accents are implemented as -- and only as --- independent code points makes this something that unfortunately can't be solved by locales. My sorting algorithm actually decomposes the unicode input entirely to generate the sort keys, so I can ensure I'm dealing with the smallest possible set of code points.

P.S. I saw your issue on my github page, and I actually already have such a list untracked in my local project folder! I will add more entries and commit it some time tomorrow. I'm also am working on Python 2/3 cross compatibility. I'm not sure if it will work since this problem invoves so much, eh, "unicodery", but it's worth a shot, I suppose (though I do hate enabling those who won't upgrade to Python 3...)


More information about the sword-devel mailing list