[sword-devel] Searching and Lucene thoughts

Stephen Denne spdlist at ihug.co.nz
Thu Mar 3 12:49:50 MST 2005


Chris,

The Boyer-Moore algorithm is only language specific in that it works better
in languages that have larger alphabets, and consequently less chance of a
single character matching a searched for character.

It is a general purpose search algorithm.

It does not look at the end of each word, that is a misunderstanding of the
algorithm. It is not based on word suffixes.

Basically, if it is looking for a 10 character long sequence, it checks the
10th character, the 20th character, the 30th character, the 40th character,
etc, until it finds a character that is in the 10 character long sequence it
is looking for. If it isn't in there, then there was no point checking any
of the previous 9 characters.

Stephen Denne.
--
Datacute - Acute Information Revelation Tools
http://www.datacute.co.nz/


> -----Original Message-----
> From: sword-devel-bounces at crosswire.org
> [mailto:sword-devel-bounces at crosswire.org]On Behalf Of Chris Little
> Sent: Thursday, March 03, 2005 8:54 AM
> To: SWORD Developers' Collaboration Forum
> Subject: Re: [sword-devel] Searching and Lucene thoughts
>
>
> No. Standard Sword searches just start at the beginning and search to
> the end, byte by byte.
>
> Just on the basis of the abstract you link to, I don't see how this
> would be of any benefit. The Boyer-Moore algorithm is very
> language-specific. It benefits from the fact that English is a
> predominantly suffixing language, as are most European languages, I
> would say. Personally, I have difficulty imagining how this actually
> speeds search times, but I assume they've done testing and that their
> claims are accurate.
>
> The standard linear search is the most general purpose search algorithm,
> and I think general purpose is what we need to maintain. For people who
> want faster searches, there is indexed searching available.
>
> --Chris
>
>
> Lynn Allan wrote:
> > <alert comment="iwnacsmndipootv ... i was not a computer science major
> > ... ">
> >
> > Just curious ... does non-indexed sword-api searching use c.s.
> > algorithms like Boyer-Moore searching?
> >
> http://portal.acm.org/citation.cfm?id=359859&coll=ACM&dl=ACM&CFID=
13545783&CFTOKEN=93236524
>
> Something I tried to read once (and it was waaaaaay over my head)
> concerned very smart "state machine" searching when there is more than
> one word being searched for. Seems like it involved Bell Lab
> researchers? From one of the A or W or K dudes?
>
http://portal.acm.org/citation.cfm?id=360855&coll=ACM&dl=ACM&CFID=13626066&C
FTOKEN=93658335
>
> Does D. Knuth discuss string matching optimizations?
>
> Would that be applicable to the sword-api?
>
> </alert>
>
>
> _______________________________________________
> sword-devel mailing list
> sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel

_______________________________________________
sword-devel mailing list
sword-devel at crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel



More information about the sword-devel mailing list