Kamal Abou Mikhael kamal.abm at gmail.com
Fri Jul 11 08:17:13 MST 2008

Dear All,

Some notes about the significance of the short vowels and Arabic search.

1.  When short vowels are not present, the meaning of the word can be 
The reader disambiguates by context, logic, or previous knowledge of the 
The difference between the verbs "to kill" or "to be killed" lies in the 
short vowel.
Thus, we can never overestimate their importance.

2.  Un-vowelized text is highly valuable in terms of search because it 
makes it
much easier and beneficial.  No Arabic searcher wants to type short 
vowels, it's
tedious and you may get it wrong.  Not only that...  most queries into 
the text are supposed
to be ambiguous.  The fact that "to kill" and "to be killed" are packed 
in one word would make
a vowel-free search equivalent to the same kind of search that occurs in 
In addition, words that differ only in their vowelization are often 
related in meaning.

3.  Arabic has a root/pattern morphology that makes many search options 
One can search for words with a similar root or with a similar pattern.  
There is even
a hybrid approach that I explored in my masters thesis that converts 
related verbal nouns
to their related verbs.

This kind of stuff exists by default in English because 
"reader","reading","read", and "readable"
will all show up in search because the "er","ing", and "able" are not 
mixed inside the word.

Anyway, I bring all this up to say that it would be valuable to have 
non-vowelized search
of vowelized text and to have varying modes of search.

I did some work with lucene in Java and I'm aware that it is possible to 
implement different kind
of filters and to keep track of the location of the token in the 
original document.

The time I can spend on this is  limited, almost none.  However, if 
someone would like to take
these insights and use them, it would be beneficial and interesting at 
the same time.

If someone is interesed, I can alsp provide you with my M.S. thesis, 
which was about a
configurable stemming engine.  The implementation was evaluated within 
IR.  However,
the methods use may be of more value in Bible search.

God bless,
Kamal Abou Mikhael

