[sword-devel] Comming soon: new improved sword searching

Joel Mawhorter sword-devel@crosswire.org
Sun, 08 Sep 2002 19:31:23 -0700


On September 8, 2002 13:12, Chris Little wrote:
> FWIW, we need to upgrade our regexp engine.  The current one (from GNU)
> has a couple of problems that I was aware of.  First it is GPL--this is
> the last GPL component in the library.  If it were replaced with something
> else, we could license Sword under non-GPL licenses to other entities
> (e.g. Bible societies that don't want to deal with GPL's restrictions) or
> put it out publicly under a license that we write that better meets our
> needs than the GPL.  Second (and probably more immediately important) it
> doesn't handle UTF-8.

Wouldn't it make more sense to use UTF-16 than UTF-8 in regular expressions. 
At least with UTF-16, in most cases, 1 character == 1 symbol so regular 
expressions would be more managable (e.g. what does a dot mean in a regular 
expression when being matched against symbols that can be represented in 1,2 
or 3 chars?). Does ICU have regular expression support? I know the regular 
expression support in Java 1.4 is very nice and uses UTF-16 but alas we can't 
really use that in Sword unless we come up with a CNNI (C non-native 
interface :-).

Joel