<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Please remember,<br>
      <br>
      SWORD already supports a search normalization layer.&nbsp; We have
      normalizers for many things like accents, diacritics, etc., that
      we run on the text before passing the text to lucene (or using our
      own search mechanism).<br>
      <br>
      SWORD has distinct stages where it applies filters.&nbsp; The two most
      obvious are the render stage and the search stage (names Render
      and Strip in the engine).&nbsp; We have many filters that do many
      different things and any can be applied to a module for
      normalizing during search by including a:
      LocalStripFilter=FilterName in the module's .conf file.<br>
      <br>
      Here are the filters currently available:<br>
      <a class="moz-txt-link-freetext" href="http://www.crosswire.org/svn/sword/trunk/src/modules/filters/">http://www.crosswire.org/svn/sword/trunk/src/modules/filters/</a><br>
      <br>
      <br>
      So, for example, we use have:<br>
      <br>
      LocalStripFilter=UTF8GreekAccents<br>
      LocalStripFilter=PapyriPlain<br>
      <br>
      To normalize papyrilogical searches on the Duke Databank of
      Papyri:<br>
<a class="moz-txt-link-freetext" href="http://crosswire.org/study/wordsearchresults.jsp?mod=DDP&amp;searchTerm=%CF%80%CE%B1%CF%81%CE%B1%CE%B3%CE%B3%CE%B5%CE%BB%CE%BB*">http://crosswire.org/study/wordsearchresults.jsp?mod=DDP&amp;searchTerm=%CF%80%CE%B1%CF%81%CE%B1%CE%B3%CE%B3%CE%B5%CE%BB%CE%BB*</a><br>
      <br>
      These normalizations discussed certainly need to be discussed and
      considered but we have a mechanism in place to do this in SWORD.<br>
      <br>
      Troy<br>
      <br>
      <br>
      <br>
      On 03/03/2013 05:57 PM, DM Smith wrote:<br>
    </div>
    <blockquote
      cite="mid:DD0A06DB-1C52-4A58-9224-4639E42CC988@crosswire.org"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=ISO-8859-1">
      <br>
      <div>
        <div>On Mar 3, 2013, at 11:53 AM, Chris Burrell &lt;<a
            moz-do-not-send="true" href="mailto:chris@burrell.me.uk">chris@burrell.me.uk</a>&gt;
          wrote:</div>
        <br class="Apple-interchange-newline">
        <blockquote type="cite">
          <p dir="ltr">Yes although in French only the contacted form is
            correct</p>
          <div><br>
          </div>
        </blockquote>
        <div><br>
        </div>
        WRT indexing and searching, it really doesn't matter which is
        correct. The normalization is not visible to the user.
        Normalization often goes to forms that are ugly for the
        end-user.</div>
      <div><br>
      </div>
      <div>-- DM</div>
      <div><br>
        <blockquote type="cite">
          <div class="gmail_quote">On 3 Mar 2013 16:10, "David Haslam"
            &lt;<a moz-do-not-send="true"
              href="mailto:dfhmch@googlemail.com">dfhmch@googlemail.com</a>&gt;
            wrote:<br type="attribution">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              There are similar issues in French modules.<br>
              <br>
              e.g. Some French Bibles have "coeur", some have "c&#339;ur",
              and some even use<br>
              both!<br>
              <br>
              etc., etc.<br>
              <br>
              David<br>
              <br>
              <br>
              <br>
              --<br>
              View this message in context: <a moz-do-not-send="true"
href="http://sword-dev.350566.n4.nabble.com/Searching-for-hyphenated-words-tp4652016p4652042.html"
                target="_blank">http://sword-dev.350566.n4.nabble.com/Searching-for-hyphenated-words-tp4652016p4652042.html</a><br>
              Sent from the SWORD Dev mailing list archive at <a
                moz-do-not-send="true" href="http://Nabble.com">Nabble.com</a>.<br>
              <br>
              _______________________________________________<br>
              sword-devel mailing list: <a moz-do-not-send="true"
                href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
              <a moz-do-not-send="true"
                href="http://www.crosswire.org/mailman/listinfo/sword-devel"
                target="_blank">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
              Instructions to unsubscribe/change your settings at above
              page</blockquote>
          </div>
          _______________________________________________<br>
          sword-devel mailing list: <a moz-do-not-send="true"
            href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
          <a moz-do-not-send="true"
            href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
          Instructions to unsubscribe/change your settings at above page</blockquote>
      </div>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
    </blockquote>
    <br>
  </body>
</html>