[sword-devel] [Fwd: BOUNCE sword-devel@crosswire.org: Non-member submission from ["Jeremy Bettis" <jeremyb@hksys.com>]]

Troy A. Griffitts sword-devel@crosswire.org
Tue, 12 Jun 2001 13:53:43 -0700


This is a multi-part message in MIME format.
--------------FD8C4D4287D4C556A0D603E1
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Jeremy,
	Your message was sent from an unsubscribed address and bounced back to
me.  I'm forwarding to the list.
--------------FD8C4D4287D4C556A0D603E1
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Return-Path: <owner-sword-devel@crosswire.org>
Received: (from majordomo@localhost)
	by www.crosswire.org (8.9.3/8.9.3) id IAA22012;
	Tue, 12 Jun 2001 08:41:49 -0700
Date: Tue, 12 Jun 2001 08:41:49 -0700
From: owner-sword-devel@crosswire.org
Message-Id: <200106121541.IAA22012@www.crosswire.org>
X-Authentication-Warning: www.crosswire.org: majordomo set sender to owner-sword-devel@crosswire.org using -f
To: owner-sword-devel@crosswire.org
Subject: BOUNCE sword-devel@crosswire.org:    Non-member submission from ["Jeremy Bettis" <jeremyb@hksys.com>]   

>From scribe@crosswire.org  Tue Jun 12 08:41:49 2001
Received: from hksys.com (IDENT:root@dns.hksys.com [206.222.220.16])
	by www.crosswire.org (8.9.3/8.9.3) with ESMTP id IAA22009
	for <sword-devel@crosswire.org>; Tue, 12 Jun 2001 08:41:48 -0700
Received: from silver (silver.hksys.com [10.0.1.21])
	by hksys.com (8.11.1/8.11.1) with SMTP id f5CFhiv13905
	for <sword-devel@crosswire.org>; Tue, 12 Jun 2001 10:43:44 -0500
Message-ID: <008c01c0f356$76a05550$1501000a@hksys.com>
From: "Jeremy Bettis" <jeremyb@hksys.com>
To: <sword-devel@crosswire.org>
References: <000201c0f2ff$383c92d0$b53b5140@didymus>
Subject: Re: [sword-devel] character encoding conversion
Date: Tue, 12 Jun 2001 10:43:44 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4522.1200
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
X-Scanned-By: MIMEDefang 0.7 (http://www.roaringpenguin.com/mimedefang/)

If the list is not sparse then the simplest method for conversion would be
to make an array.

short jis2unicode = { 1,2,3,4,5,6,7,26,6,23,7,2,7, ...etc for all 7000
mappings...};
#DEFINE JIS_OFFSET 0 // whatever the first jis character is
#DEFINE UNICODE_OFFSET 0 // Just to make the numbers in your list smaller,
you can set this to the smallest value and make all of the rest of them
relative to it.

short convertJIS2Unicode(short jischar) {
    return jis2unicode[jischar - JIS_OFFSET] + UNICODE_OFFSET;
}


This should much smaller than your switch statement.  If the JIS list is
sparse though, then this will be a much bigger array than 7000 shorts.
--
Jeremy Bettis -- Software Development Manager
Hickman-Kenyon Systems, Inc.
jeremyb@hksys.com
----- Original Message -----
From: "Chris Little" <chrislit@chiasma.org>
To: "SWORD Devel List" <sword-devel@crosswire.org>
Sent: Tuesday, June 12, 2001 12:19 AM
Subject: [sword-devel] character encoding conversion


> I realize new versions of BibleCS and BibleTime are just about to go out
> the door, so I'm not suggesting the following feature be added to those
> front ends before they ship.  We're always going to be in development
> and there may always be a few modules that aren't supported by the
> current released version, but that's a good sign that we're pushing
> ourselves along very quickly.
>
> As I briefly mentioned, I have a Japanese Bible encoded as Shift-JIS.  I
> think it is advantageous to keep this encoding because it is smaller
> than UTF-8 for Japanese.  When the module is read, it can either be left
> in Shift-JIS or converted to UTF-8 for presentation.
>
> I wrote a couple new functions and a new class to do this.  The new
> functions are UTF32to8 and UTF8to32 and they just convert between a
> UTF-32 long int and a UTF-8 6 char array.  The new class is derived from
> SWFilter and is called SJISUTF8.  It converts SJIS to UTF-32 and then to
> UTF-8.  I have come upon a problem, though.
>
> There is no real correlation between SJIS and Unicode, so the conversion
> between the two requires a huge lookup table.  (There are about 7000
> code points in SJIS.)  I implemented this as a single big switch.  The
> resulting class takes forever to compile and has a huge file size.
> So....
>
> Does anyone have a suggestion for a better way to store this lookup
> table, which does nothing but correlate 7000 shorts with 7000 different
> shorts?
>
> Per-character encoding filters should be a lot of fun.  We can do all
> kinds of stuff like transliteration and reducing module sizes through
> regional encoding systems while maintaining Unicode compliance of the
> end result.
>
> --Chris
>
>

--------------FD8C4D4287D4C556A0D603E1--