[sword-devel] Fwd: [sword-svn] r2045 - trunk/src/modules

DM Smith dmsmith555 at yahoo.com
Thu May 3 12:28:22 MST 2007


Everyone,

I need to apologize for the change from StandardAnalyzer to 
SimpleAnalyzer. I should not have made the change without an open 
conversation here first. I did not think through the impact of such a 
change. Troy was gracious enough to point it out to me. To revert the 
change just replace SimpleAnalyzer with standard::StandardAnalyzer 
globally. I can supply a patch if need be.

In His Service,
    DM

DM Smith wrote:
> Martin Gruner wrote:
>   
>> Hi Chris,
>>
>> are you sure you want to move from StandardAnalyzer to SimpleAnalyzer? IIRC 
>> searches won't find English stop words like "for", "then", "and"...
>>   
>>     
>
> The SimpleAnalyzer does no stop word analysis. The only thing it does is 
> lowercase everything and finds tokens as sequences of letters bounded by 
> non-letters.
>
> The StandardAnalyzer does a whole boatload of stuff, in addition to what 
> SimpleAnalyzer does:
> * Splits words at punctuation characters, removing punctuation. However, 
> a  dot that's not followed by whitespace is considered part of a token. 
> (Eliminated later as part of an acronym)
> * Splits words at hyphens, unless there's a number in the token, in 
> which case the whole token is interpreted as a product number and is not 
> split.
> * Recognizes email addresses and internet hostnames as one token.
> * Removes ' from words followed by a trailing s or S.
> * Removes . from things it considers acronyms.
> * Eliminates the following English stop words:
>     a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, 
> not, of, on, or, such, that, the, their, then, there, these, they, this, 
> to, was, will, with
>
> The SimpleAnalyzer is correct.
>
>   
>> mg
>>
>> ----------  Weitergeleitete Nachricht  ----------
>>
>> Subject: [sword-svn] r2045 - trunk/src/modules
>> Date: Donnerstag, 3. Mai 2007
>> From: chrislit at www.crosswire.org
>> To: sword-cvs at crosswire.org
>>
>> Author: chrislit
>> Date: 2007-05-03 03:41:07 -0700 (Thu, 03 May 2007)
>> New Revision: 2045
>>
>> Modified:
>>    trunk/src/modules/swmodule.cpp
>> Log:
>> DM's RAMDirectory patch for CLucene indexing
>>
>> Modified: trunk/src/modules/swmodule.cpp
>> ===================================================================
>> --- trunk/src/modules/swmodule.cpp	2007-05-01 17:35:31 UTC (rev 2044)
>> +++ trunk/src/modules/swmodule.cpp	2007-05-03 10:41:07 UTC (rev 2045)
>> @@ -515,7 +515,7 @@
>>  			is = new IndexSearcher(ir);
>>  			(*percent)(10, percentUserData);
>>  
>> -			standard::StandardAnalyzer analyzer;
>> +			SimpleAnalyzer analyzer;
>>  			lucene_utf8towcs(wcharBuffer, istr, MAX_CONV_SIZE); //TODO Is istr always 
>> utf8?
>>  			q = QueryParser::parse(wcharBuffer, _T("content"), &analyzer);
>>  			(*percent)(20, percentUserData);
>> @@ -960,10 +960,12 @@
>>  		setKey(*searchKey);
>>  	}
>>  
>> -	IndexWriter *writer = NULL;
>> +	RAMDirectory *ramDir = NULL;
>> +	IndexWriter *coreWriter = NULL;
>> +	IndexWriter *fsWriter = NULL;
>>  	Directory *d = NULL;
>>   
>> -	standard::StandardAnalyzer *an = new standard::StandardAnalyzer();
>> +	SimpleAnalyzer *an = new SimpleAnalyzer();
>>  	SWBuf target = getConfigEntry("AbsoluteDataPath");
>>  	bool includeKeyInSearch = 
>> getConfig().has("SearchOption", "IncludeKeyInSearch");
>>  	char ch = target.c_str()[strlen(target.c_str())-1];
>> @@ -972,19 +974,10 @@
>>  	target.append("lucene");
>>  	FileMgr::createParent(target+"/dummy");
>>  
>> -	if (IndexReader::indexExists(target.c_str())) {
>> -		d = FSDirectory::getDirectory(target.c_str(), false);
>> -		if (IndexReader::isLocked(d)) {
>> -			IndexReader::unlock(d);
>> -		}
>> -																		   
>> -		writer = new IndexWriter( d, an, false);
>> -	} else {
>> -		d = FSDirectory::getDirectory(target.c_str(), true);
>> -		writer = new IndexWriter( d ,an, true);
>> -	}
>> +	ramDir = new RAMDirectory();
>> +	coreWriter = new IndexWriter(ramDir, an, true);
>> +	
>>  
>> -
>>   
>>  	char perc = 1;
>>  	VerseKey *vkcheck = 0;
>> @@ -1222,7 +1215,7 @@
>>  		if (good) {
>>  //printf("writing (%s).\n", (const char *)*key);
>>  //fflush(stdout);
>> -			writer->addDocument(doc);
>> +			coreWriter->addDocument(doc);
>>  		}
>>  		delete doc;
>>  
>> @@ -1230,9 +1223,29 @@
>>  		err = Error();
>>  	}
>>  
>> -	writer->optimize();
>> -	writer->close();
>> -	delete writer;
>> +	// Optimizing automatically happens with the call to addIndexes
>> +	//coreWriter->optimize();
>> +	coreWriter->close();
>> +
>> +	if (IndexReader::indexExists(target.c_str())) {
>> +		d = FSDirectory::getDirectory(target.c_str(), false);
>> +		if (IndexReader::isLocked(d)) {
>> +			IndexReader::unlock(d);
>> +		}
>> + 
>> +		fsWriter = new IndexWriter( d, an, false);
>> +	} else {
>> +		d = FSDirectory::getDirectory(target.c_str(), true);
>> +		fsWriter = new IndexWriter( d ,an, true);
>> +	}
>> +
>> +	Directory *dirs[] = { ramDir, 0 };
>> +	fsWriter->addIndexes(dirs);
>> +	fsWriter->close();
>> +
>> +	delete ramDir;
>> +	delete coreWriter;
>> +	delete fsWriter;
>>  	delete an;
>>  
>>  	// reposition module back to where it was before we were called




More information about the sword-devel mailing list