[sword-devel] character encoding conversion

David Burry sword-devel@crosswire.org
Tue, 12 Jun 2001 21:32:49 -0700


ok, good answer, you can tell I don't always know what I'm talking about 
but I know some and I'm getting there... ;o)  By the way, when you do your 
Unicode support, make sure you don't make assumptions about 2 bytes and 
support the full UCS-4 (4 byte) spec internally, since Unicode 3.1 includes 
full support for 3 new asian encodings (one of which is "mandatory" in 
mainland china) and more than 64k characters....  The company I work for 
(Adobe) is a member so I learned some cool stuff at work today about this.

Dave

At 10:44 AM 6/12/2001 -0700, Chris Little wrote:
>Yeah, I had the same thought of using a hash table, but decided against
>it because I had erroneously thought it would be larger in memory than a
>giant switch.  (Don't ask me why, it was late.)  I'll try
>re-implementing as an STL map since that's what I'm familiar with.
>Other ideas are welcome still.
>
>I looked at the various Unicode libraries available and none of those I
>saw were adequate or were too large to include for our minimal needs.
>IBM's ICU looked very nice, but it's large and I don't really want to
>worry about adding IBM Public License materials to the project.  If we
>write it ourselves, we can license under our own terms.  We can also be
>assured that our code will do exactly what WE need it to do, rather than
>perhaps a more general or less efficient function.  I rewrote our Roman
>numeral functions for the same reasons (license & specificity to our
>task).
>
>Besides that, we're not going to maintain the tables ourselves.  We'll
>use the tables from Unicode, Inc. which they state are very stable.
>Once the basic mechanism is set up, doing classes for all the
>conversions they support will be a piece of cake.
>
>--Chris
>
> > -----Original Message-----
> > From: owner-sword-devel@crosswire.org [mailto:owner-sword-
> > devel@crosswire.org] On Behalf Of David Burry
> > Sent: Tuesday, June 12, 2001 9:36 AM
> > To: sword-devel@crosswire.org; SWORD Devel List
> > Subject: Re: [sword-devel] character encoding conversion
> >
> > Most higher level languages have some sort of hash or associative
>array
> > built in, perhaps there are a few libraries somewhere for C to do this
> > even
> > more efficiently since all keys and values are the same length (two
>bytes)
> > from UCS16 to SJIS?  I assume a simple calculation and 14k array will
>work
> > from SJIS to UCS16...  In addition, aren't there already lots of
>Unicode
> > conversion libraries out there we could link against?  There are
>literally
> > dozens of conversions to/from Unicode I don't know if we should be
> > maintaining all the tables ourselves...
> >
> > Dave
> >