[sword-devel] Coming soon: new improved sword searching

Geoffrey W Hastings sword-devel@crosswire.org
Sun, 8 Sep 2002 20:27:20 -0700


This message is in MIME format.  Since your mail reader does not understand
this format, some or all of this message may not be legible.

----__JNP_000_2414.61fa.4724
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

If this program is to be used by Pastors and Sunday school teachers
etc... I think they will mostly fall into the category of the uninitiated
to regex. Including myself.
> The REAL reason to keep it is because of geek appeal.  What kind of
free
> software project would we be if we didn't support regex?  :) 

And a loud cheer goes up for Joel from the users gallery :-)
On Sat, 07 Sep 2002 22:42:37 -0700 Joel Mawhorter <joel-ml@mawhorter.org>
writes:
> 
> I will be working on is adding a new type of search to  Sword.
I have used just about all of these searches mentioned in  OLB and have
found these searches very useful. You can also search for "All verses
containing two or more of God"   by inputting  "god ... god" Though I
can't remember a time when had a use for such a search. But I could site
numerouse times I have used the search methods below. That's one reason I
still keep OLB loaded.
>  The new search type will be based on typical boolean search 
> operations (AND, OR, NOT,and maybe XOR using the operators &, |, !, and
^ 
> respectively).  Grouping with parenthases will be supported. For
example, (God & 
> (Father |  Son | Spirit)) will give you all of the verses that have the
word 
> "God" and  at least one of the words "Father", "Son" and "Spirit". Both
word 
> and phrase  search terms will be supported within the same search
expression. 
> For  example, (Jesus & "son of God") will find all verses with both the

> word and  the phrase in them. I will also be adding a specialized AND
operator 
> that  considers verse proximity. For example, ("lamb of God", Jesus,
"take 
> away",  sins @3) will find all combinations of verses within 3 of each
other 
> that  have all the search terms in them. This could be one verse that
has 
> all the  search terms or any set of n verses (where n <= the number of
search 
> terms),  each with one or more of the search terms, such that the two
verses 
> in the  set that are fartest apart do not have more than two verses in 
> between. I  will also allow simple wildcards. I'm not sure how simple
or complex 
> that  will be yet but at a minimum will allow something like (Jesus & 
> lov*) which  will find love, loving, etc. All of the above functions
will be 
> useable  within one search expression. For example: 
> ((one*,"a phrase",two@2) ^ (three & !(four | five)). I'm not certain 
> anyone  would ever need a search expression of that complexity but it
just 
> gives an  example of what will be possible. I intend this search
functionality 
> to be  practical superset of the existing search types. It won't be
exactly 
> a superset since it won't have full regular expression support. 
> However, I  think that with the functionality available, regular
expressions 
> won't be  necessary. If any of you can think of an example of something
that 
> you do  with the current regular expression searching that won't be
possible 
> with  what I described above, please let me know.
> 
> The second area that I will be working on is adding indexed 
> searching where searching can be done on a precomputed index of search
terms rather 
> than the current mechanism where the whole Bible has to be read in from
disk 
> and searched in a brute force manner. This should decrease the search 
> time to a very small fraction of what it currently is. One downside of
indexed  
> searching is that full regular expression searching isn't very 
> feasible. I'll  leave it as an exercise for the reader to verify that
searching for 
> /a.*b/  would be neither be very easy to implement nor very fast using
an 
> index  (grin).
> 

> 
> In Christ,
> 
> Joel Mawhorter
> 
> 
> 
> 
----__JNP_000_2414.61fa.4724
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3Dcontent-type content=3Dtext/html;charset=3Dus-ascii>
<META content=3D"MSHTML 6.00.2719.2200" name=3DGENERATOR></HEAD>
<BODY bottomMargin=3D0 leftMargin=3D3 topMargin=3D0 rightMargin=3D3>
<DIV><FONT color=3D#ff0000><FONT color=3D#0000ff>If this program is to be =
used by=20
Pastors and Sunday school teachers etc... I think they will mostly fall =
into the=20
category of the uninitiated to regex. Including myself.</FONT></FONT></DIV>
<DIV>&gt; The REAL reason to keep it is because of geek appeal.&nbsp; What =
kind=20
of free<BR>&gt; software project would we be if we didn't support regex?&=
nbsp;=20
:) </DIV>
<DIV>&nbsp;</DIV>
<DIV><STRONG><FONT color=3D#ff0000>And a loud cheer goes up for Joel from =
the=20
users gallery :-)</FONT></STRONG></DIV>
<DIV>On Sat, 07 Sep 2002 22:42:37 -0700 Joel Mawhorter &lt;<A=20
href=3D"mailto:joel-ml@mawhorter.org">joel-ml@mawhorter.org</A>&gt;=20
writes:<BR>&gt; <BR>&gt; I will be working on is adding a new type of =
search=20
to&nbsp; Sword.</DIV>
<DIV><FONT color=3D#0000ff>I have used just about all of&nbsp;these =
searches=20
mentioned in &nbsp;OLB&nbsp;and have found these searches very useful. You =
can=20
also search for "</FONT><FONT color=3D#0000ff>All verses containing two or =
more of=20
God" &nbsp; by inputting&nbsp; "god ... god" Though I can't remember a time=
 when=20
had a use for such a search. But I could site numerouse times I have used =
the=20
search methods below. That's one reason I still keep OLB loaded.</FONT></=
DIV>
<DIV>&gt;&nbsp; The new search type will be based on typical boolean search=
=20
<BR>&gt; operations (AND, OR, NOT,and maybe XOR using the operators &amp;, =
|, !,=20
and ^ <BR>&gt; respectively).&nbsp; Grouping with parenthases will be =
supported.=20
For example, (God &amp; <BR>&gt; (Father |&nbsp; Son | Spirit)) will give =
you=20
all of the verses that have the word <BR>&gt; "God" and&nbsp; at least one =
of=20
the words "Father", "Son" and "Spirit". Both word <BR>&gt; and phrase&nbsp;=
=20
search terms will be supported within the same search expression. <BR>&gt;=
=20
For&nbsp; example, (Jesus &amp; "son of God") will find all verses with =
both the=20
<BR>&gt; word and&nbsp; the phrase in them. I will also be adding a =
specialized=20
AND operator <BR>&gt; that&nbsp; considers verse proximity. For example, ("=
lamb=20
of God", Jesus, "take <BR>&gt; away",&nbsp; sins @3) will find all =
combinations=20
of verses within 3 of each other <BR>&gt; that&nbsp; have all the search =
terms=20
in them. This could be one verse that has <BR>&gt; all the&nbsp; search =
terms or=20
any set of n verses (where n &lt;=3D the number of search <BR>&gt; terms),&=
nbsp;=20
each with one or more of the search terms, such that the two verses <BR>&gt=
; in=20
the&nbsp; set that are fartest apart do not have more than two verses in=20
<BR>&gt; between. I&nbsp; will also allow simple wildcards. I'm not sure =
how=20
simple or complex <BR>&gt; that&nbsp; will be yet but at a minimum will =
allow=20
something like (Jesus &amp; <BR>&gt; lov*) which&nbsp; will find love, =
loving,=20
etc. All of the above functions will be <BR>&gt; useable&nbsp; within one =
search=20
expression. For example: <BR>&gt; ((one*,"a phrase",two@2) ^ (three &amp; !=
(four=20
| five)). I'm not certain <BR>&gt; anyone&nbsp; would ever need a search=20
expression of that complexity but it just <BR>&gt; gives an&nbsp; example =
of=20
what will be possible. I intend this search functionality <BR>&gt; to be&=
nbsp;=20
practical superset of the existing search types. It won't be exactly <BR>&=
gt; a=20
superset since it won't have full regular expression support. <BR>&gt; =
However,=20
I&nbsp; think that with the functionality available, regular expressions=20
<BR>&gt; won't be&nbsp; necessary. If any of you can think of an example of=
=20
something that <BR>&gt; you do&nbsp; with the current regular expression=20
searching that won't be possible <BR>&gt; with&nbsp; what I described above=
,=20
please let me know.<BR>&gt; <BR>&gt; The second area that I will be working=
 on=20
is adding indexed <BR>&gt; searching where searching can be done on a=20
precomputed index of search terms rather <BR>&gt; than the current =
mechanism=20
where the whole Bible has to be read in from disk <BR>&gt; and searched in =
a=20
brute force manner. This should decrease the search <BR>&gt; time to a very=
=20
small fraction of what it currently is. One downside of indexed&nbsp; <BR>&=
gt;=20
searching is that full regular expression searching isn't very <BR>&gt;=20
feasible. I'll&nbsp; leave it as an exercise for the reader to verify that=
=20
searching for <BR>&gt; /a.*b/&nbsp; would be neither be very easy to =
implement=20
nor very fast using an <BR>&gt; index&nbsp; (grin).<BR>&gt; <BR><BR>&gt;=20
<BR>&gt; In Christ,<BR>&gt; <BR>&gt; Joel Mawhorter<BR>&gt; <BR>&gt; <BR>&=
gt;=20
<BR>&gt; </DIV>
<DIV>&nbsp;</DIV>
<DIV></DIV></BODY></HTML>

----__JNP_000_2414.61fa.4724--


________________________________________________________________
GET INTERNET ACCESS FROM JUNO!
Juno offers FREE or PREMIUM Internet access for less!
Join Juno today!  For your FREE software, visit:
http://dl.www.juno.com/get/web/.