Tue, 11 Sep 2001 17:24:13 -0700
*sigh* I was just completely wrong in my last post and I think we need to take away my CVS write access for a week or so as punishment. ;)
Previously I'd stated that there's only one way SCSU encoding for a give string. It turns out (now that I've read the SCSU spec) you can do varying degrees of SCSU encoding so there are really many possible encodings. SCSU encoding/decoding require a really complicated algorithm.
I've also been looking at a Unicode compression algorithm from IBM called BOCA that DOES only have 1 compressed form per string and has a much simpler algorithm.
I'm really not even sure we can get worthwhile space savings from SCSU or BOCA plus ZIP/LZSS so I'm going to retract this from the task list until I can get SCSU & BOCA encoders implemented in Perl to do some tests.