[sword-devel] Arabic Bible
Kamal Abou Mikhael
kamal.abm at gmail.com
Fri Jul 11 08:17:13 MST 2008
Some notes about the significance of the short vowels and Arabic search.
1. When short vowels are not present, the meaning of the word can be
The reader disambiguates by context, logic, or previous knowledge of the
The difference between the verbs "to kill" or "to be killed" lies in the
Thus, we can never overestimate their importance.
2. Un-vowelized text is highly valuable in terms of search because it
much easier and beneficial. No Arabic searcher wants to type short
tedious and you may get it wrong. Not only that... most queries into
the text are supposed
to be ambiguous. The fact that "to kill" and "to be killed" are packed
in one word would make
a vowel-free search equivalent to the same kind of search that occurs in
In addition, words that differ only in their vowelization are often
related in meaning.
3. Arabic has a root/pattern morphology that makes many search options
One can search for words with a similar root or with a similar pattern.
There is even
a hybrid approach that I explored in my masters thesis that converts
related verbal nouns
to their related verbs.
This kind of stuff exists by default in English because
"reader","reading","read", and "readable"
will all show up in search because the "er","ing", and "able" are not
mixed inside the word.
Anyway, I bring all this up to say that it would be valuable to have
of vowelized text and to have varying modes of search.
I did some work with lucene in Java and I'm aware that it is possible to
implement different kind
of filters and to keep track of the location of the token in the
The time I can spend on this is limited, almost none. However, if
someone would like to take
these insights and use them, it would be beneficial and interesting at
the same time.
If someone is interesed, I can alsp provide you with my M.S. thesis,
which was about a
configurable stemming engine. The implementation was evaluated within
the methods use may be of more value in Bible search.
Kamal Abou Mikhael
More information about the sword-devel