[sword-devel] indexed search discrepancy

Troy A. Griffitts scribe at crosswire.org
Fri Aug 28 15:24:35 MST 2009


Thanks again Matthew.  Writing quick for lack of time right now.

In general, we avoid the use of wchar_t because it is define differently
on different systems, making its intended use (as a unicode character)
holder at best essentially useless for anything other than UTF-16, and
at least confusing and ambiguous.

I could probably look this up, but since you know where everything is in
clucene by now...

What EXACTLY is TCHAR defined as (i.e. what is sizeof(TCHAR))?  Same on
all platforms?

What does lucene_utf8towc return? TCHAR? wchar_t?

What I'm trying to determine is:

Is clucene expecting UTF-16
(which can represent 15 bits of unicode glyph space in 2 bytes,
reserving the upper bit as a multicode indicator, and if set then moves
to 4+ bytes after 15 bits)?

... or is clucene just saying 16 bits of unicode glyph space is good
enough for government work; we're not gonna worry about the rest?



More information about the sword-devel mailing list