[sword-devel] Chinese "words"

Daniel Glassey sword-devel@crosswire.org
Mon, 30 Jun 2003 17:54:47 +0100


Hiya :),
Could you please change your mailer settings to do some quoting so 
that people can see what you have written without having to look at 
previous mails for context. If you look below you can see how 
difficult it is as this is what I get.

Regards,
Daniel

On 30 Jun 2003 at 12:34, YTang0648@aol.com sent forth the message:

> In a message dated 6/27/2003 5:23:34 PM Pacific Daylight Time, 
> crenz-swordproject@web42.com writes:
> >I think the right thing to do is to change your layout engine to support
> >correct Chinese line wrapping, instead of adding space (which should not be
> >there) to work around the limitation in the layout engine.
> 
> I second that.
> 
> >Neither have space after puncation. No space, period.
> 
> Whoops... you're right. Thanks for the correction. I noticed people
> don't seem to do it the "correct" way always, though, but I guess it
> also depends on the font being used (ie. the glyphs being the correct
> width for punctuation marks).
> 
> >Second, there are no easy way to parse a word.
> 
> That's why I think it would be too complicated to built Chinese word
> splitting into Sword, unless e.g. ICU starts to come with a nice
> built-in option we can just use. It's just not worth the effort. It's
> easier to just let the user make the guess himself.
> Mozilla have an Unicde base line breaker which can be easily port to other 
> enivronment. ICU also have a line breaker which is very close to the line 
> breaker interface in Java.
> Those line breaker tell the app where (in the of character buffer offset) is 
> the line break opportunity, the app then call the os to find out the length of 
> the window and the length of the text and see it want to break in there or 
> break in the next opportunity. It basically replace the operation of "find me 
> the next space character " in the westen only based layout.
> 
> The gecko based line breaker interface is on
> http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/public/nsILineBreaker.h
> The implementation is on
> http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/src/nsJISx4501LineBreaker.c
> pp
> (based on Japanese layout standard)
> 
> >google implement very good Chinese search. Maybe you should look at how they 
> do
> >the search job.
> 
> Again, I think it's overkill for Sword.
> 
> Greetings,
>    Christian
>