[sword-devel] Chinese PinYin, OSIS, SWORD and front-ends

Chris Little chrislit at crosswire.org
Tue Oct 19 14:20:22 MST 2010


On 10/19/2010 1:54 PM, Matthew Talbert wrote:
> On Tue, Oct 19, 2010 at 4:19 AM, David Haslam<d.haslam at ukonline.co.uk>  wrote:
>>
>> Something to ponder for the future then, maybe?
>>
>> See �http://crosswire.org/wiki/Talk:Transliteration
>> http://crosswire.org/wiki/Talk:Transliteration
>>
>> Thanks, Chris, for useful comments there.
>
> As Chris says there, it would require indexing both versions of the
> module, something I don't believe is currently possible. What would be
> cool (imo) is to have the transliterated text available in a different
> field, much as lemma is done now. Then a search for trans:something
> would access the transliterated data. Of course, it would be nice to
> provide this transparently to the end user.

I'm really about as ignorant of (C)Lucene as a person can be, so someone 
please correct me if I'm wrong. I believe our indexing just indexes at 
the record level (verses or dictionary entries). So, upon creation of 
the index, you could just concatenate the text and the transliterated 
text and do indexing for that. Unless you need to support exact string 
matches across record boundaries, the concatenation shouldn't affect 
results.

Something I mention on the wiki, that I think you're also advocating, is 
doing transliteration of the text on a word-by-word basis and placing 
the result in the <w xlit="..."> attribute (all via a filter). That 
partly depends on the sourcetype being OSIS (though we could do it to 
plaintext too, and change its sourcetype at runtime). We could certainly 
run such a filter process prior to indexing, which would mean that the 
transliterated text could be searched, even if transliteration is turned 
off in the current view.

--Chris



More information about the sword-devel mailing list