<br>

Having a SearchSyntax sounds like a good idea to me.<br>

<br>

It would be good if we could implement it using Lucene, we've talked about using their query parser in the past.<br>

<br>

The problems of the search query parser probably come down to the way

it has evolved, which seems to be a common pit-fall for any parser code

- the pattern seems to be that the parser evolves to the point where

squashing bugs becomes too regular and then someone sits down and

writes a grammar for it. I noticed that Groovy has just been through

this.<br>

I've dabbled with javacc successfully on a couple of projects, and once

tried to write a COBOL grammar - very unsuccessfully so I know it can

be hard. This may well be overkill for our simple syntax?<br><br>

Other than that, go for it!<br>

<br>

Joe.<br>

<br>

<br><div><span class="gmail_quote">On Apr 8, 2005 12:52 PM, <b class="gmail_sendername">DM Smith</b> &lt;<a href="mailto:dmsmith555@gmail.com">dmsmith555@gmail.com</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">I've narrowed down some of the bugs of search. Seems that the tokenizer<br>is not producing the correct stream of tokens.<br>Specifically, the algorithm using the tokens goes something like this:<br><br>while there are command tokens at the beginning of the stream get next one<br>do<br>&nbsp;&nbsp;&nbsp;&nbsp;have that command consume word tokens until it reaches a terminating<br>condition<br>done<br><br>The problem of +[mat-rev]&quot;bread of life&quot; is that this produces a token<br>stream where +[mat-rev] is not followed by a command token.<br><br>In looking at this I noticed that there is what looks like a design<br>problem. Consistently, elsewhere in JSword, an interface defines a wall<br>that BibleDesktop and JSword does not look behind. However in the case<br>of searching this is not the case.<br><br>jsword.book.search<br>&nbsp;&nbsp;&nbsp;&nbsp;provides the interfaces for Search and Index and factories to get<br>implementation<br>jsword.book.search.basic<br>&nbsp;&nbsp;&nbsp;&nbsp;provides abstract/partial implementation of the interfaces<br>jsword.book.search.parse<br>&nbsp;&nbsp;&nbsp;&nbsp;provides an implementation of Searcher<br>jsword.book.search.lucene<br>&nbsp;&nbsp;&nbsp;&nbsp;provides an implementation of Indexer<br><br>Based upon this I would have expected that no code (outside of the<br>package) would have directly used jsword.book.search.parse code.<br><br>The reason I noticed this was that I wanted to create another searcher<br>and get it from the search factory. (Start with a copy and fix bugs,<br>while retaining the ability to use BibleDesktop by changing the<br>factories properties.)<br><br>What is being used is the syntax elements to pro grammatically construct<br>a search. I'm thinking that we need YAI (yet another interface) for<br>SearchSyntax. This would be able to:<br>1) decorate individual words and phrases with appropriate syntax elements.<br>&nbsp;&nbsp;&nbsp;&nbsp;SearchSyntax ss = SearchSyntaxFactory.getSearchSyntax();<br>&nbsp;&nbsp;&nbsp;&nbsp;String decorated = ss.decorate(SyntaxType.STARTS_WITH, &quot;bread of life&quot;);<br>&nbsp;&nbsp;&nbsp;&nbsp;decorated = ss.decorate(SyntaxType.FIND_ALL_WORDS, &quot;son of man&quot;);<br>&nbsp;&nbsp;&nbsp;&nbsp;decorated = ss.decorate(SyntaxType.FIND_STRONG_NUMBERS, &quot;1234 5678&quot;);<br>&nbsp;&nbsp;&nbsp;&nbsp;decorated = ss.decorate(SyntaxType.BEST_MATCH, &quot;....&quot;);<br>&nbsp;&nbsp;&nbsp;&nbsp;decorated = ss.decorate(SyntaxType.PHRASE_SEARCH, &quot;....&quot;);<br>&nbsp;&nbsp;&nbsp;&nbsp;...<br><br>2) create a token stream from a string.<br>&nbsp;&nbsp;&nbsp;&nbsp;Token[] tokens = ss.tokenize(&quot;search string&quot;);<br>&nbsp;&nbsp;&nbsp;&nbsp;or<br>&nbsp;&nbsp;&nbsp;&nbsp;TokenStream tokens = ss.tokenize(&quot;search string&quot;);<br>&nbsp;&nbsp;&nbsp;&nbsp;or<br>&nbsp;&nbsp;&nbsp;&nbsp;...<br><br>3) serialize a token stream to a string.<br><br>Input desired!<br><br></blockquote></div><br>