[sword-devel] better UTF-sensitive sort

David Haslam dfhmch at googlemail.com
Tue Jan 12 11:59:55 MST 2016


Thanks DM.

Before tackling sorts of any kind, I first used three different programs to
convert the UTF-8 to UTF-16LE.
Although this is away from where Karl wishes to go, I still thought it would
be interesting.

BabelPad and TextPipe gave identical results which is a positive.

Notepad++ didn't cope so well in the conversion.
In fact, it cannot properly handle either encoding for the native name of
Gothic.
It seems to have no support for the Supplementary Multilingual Plane in
which the Gothic block is found.

Meanwhile, as regards sorting, with that whole table just pasted into Excel,
sorting on column B gave more or less exactly what Karl is looking for. I
think Excel stores text internally as UTF-16.

BabelPad has a multi-definition sort feature in the Edit | Columns menu.

TextPipe has special sort filter, but it's limited to ANSI, though there may
be some underlying code that's more capable.

Notepad++ does have a sort feature in the TextFX menu, but it did not sort
in the order that Karl is looking for.

I also tried using BabelPad sort with the Sort definition: Sort type=UCA and
Sort script=Neutral.
Ther results were similar to but not exactly the same as the Excel sort.

The sort dialog in BabelPad is quite sophisticated, with further options
that I've never explored.

Best regards,

David





--
View this message in context: http://sword-dev.350566.n4.nabble.com/better-UTF-sensitive-sort-tp4655731p4655737.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list