<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>So, as a side note to this thread,</p>
    <p>The Sahidic Bible is maintained at coptot.manuscriptroom.com:</p>
    <p><a class="moz-txt-link-freetext" href="http://coptot.manuscriptroom.com/transcribing?docID=1620025&amp;userName=PUBLISHED">http://coptot.manuscriptroom.com/transcribing?docID=1620025&amp;userName=PUBLISHED</a></p>
    <p> and we regularly export from there and import into swordweb,
      which is used for their browser plugin (first link on Christian
      Askeland's wonder resource list for Coptic):</p>
    <p><a class="moz-txt-link-freetext" href="https://sites.google.com/site/askelandchristian/copticlinks">https://sites.google.com/site/askelandchristian/copticlinks</a></p>
    <p>We don't index the text.  They typically search with regex (and
      yes, they know about the {byte_count} anomaly with our regex
      search).</p>
    <p>-Troy</p>
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 04/26/2017 03:21 PM, DM Smith wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:9E0CC3C8-CA45-4C81-A1C0-1962CD77ECA8@crosswire.org">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      Consider using Luke to analyze the constructed Lucene index. See: <a
        href="https://code.google.com/archive/p/luke/" class=""
        moz-do-not-send="true">https://code.google.com/archive/p/luke/</a>
      <div class="">I think you’ll need one that matches Lucene 1.9.1.
        Maybe 1.4.x.</div>
      <div class=""><br class="">
      </div>
      <div class="">DM</div>
      <div class=""><br class="">
      </div>
      <div class=""><br class="">
        <div>
          <blockquote type="cite" class="">
            <div class="">On Apr 26, 2017, at 3:48 PM, David Haslam &lt;<a
                href="mailto:dfhmch@googlemail.com" class=""
                moz-do-not-send="true">dfhmch@googlemail.com</a>&gt;
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <div class="">If you examine the result preview pane in
                the Xiphos Advanced Search dialog,<br class="">
                the problem becomes apparent.<br class="">
                <br class="">
                Most Coptic Unicode characters are not displayed
                correctly.<br class="">
                <br class="">
                <br class="">
                <br class="">
                The remainder seem to have been converted to U+FFFD
                REPLACEMENT CHARACTER.<br class="">
                <br class="">
                i.e. All these Coptic letters are basically not handled
                aright by this part<br class="">
                of the software:<br class="">
                <br class="">
                U+2C81<span class="Apple-tab-span" style="white-space:pre">        </span>ⲁ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER ALFA<br class="">
                U+2C83<span class="Apple-tab-span" style="white-space:pre">        </span>ⲃ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER VIDA<br class="">
                U+2C85<span class="Apple-tab-span" style="white-space:pre">        </span>ⲅ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER GAMMA<br class="">
                U+2C87<span class="Apple-tab-span" style="white-space:pre">        </span>ⲇ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER DALDA<br class="">
                U+2C89<span class="Apple-tab-span" style="white-space:pre">        </span>ⲉ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER EIE<br class="">
                U+2C8B<span class="Apple-tab-span" style="white-space:pre">        </span>ⲋ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER SOU<br class="">
                U+2C8D<span class="Apple-tab-span" style="white-space:pre">        </span>ⲍ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER ZATA<br class="">
                U+2C8F<span class="Apple-tab-span" style="white-space:pre">        </span>ⲏ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER HATE<br class="">
                U+2C91<span class="Apple-tab-span" style="white-space:pre">        </span>ⲑ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER THETHE<br class="">
                U+2C93<span class="Apple-tab-span" style="white-space:pre">        </span>ⲓ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER IAUDA<br class="">
                U+2C95<span class="Apple-tab-span" style="white-space:pre">        </span>ⲕ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER KAPA<br class="">
                U+2C97<span class="Apple-tab-span" style="white-space:pre">        </span>ⲗ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER LAULA<br class="">
                U+2C99<span class="Apple-tab-span" style="white-space:pre">        </span>ⲙ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER MI<br class="">
                U+2C9B<span class="Apple-tab-span" style="white-space:pre">        </span>ⲛ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER NI<br class="">
                U+2C9D<span class="Apple-tab-span" style="white-space:pre">        </span>ⲝ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER KSI<br class="">
                U+2C9F<span class="Apple-tab-span" style="white-space:pre">        </span>ⲟ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER O<br class="">
                U+2CA1<span class="Apple-tab-span" style="white-space:pre">        </span>ⲡ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER PI<br class="">
                U+2CA3<span class="Apple-tab-span" style="white-space:pre">        </span>ⲣ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER RO<br class="">
                U+2CA5<span class="Apple-tab-span" style="white-space:pre">        </span>ⲥ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER SIMA<br class="">
                U+2CA7<span class="Apple-tab-span" style="white-space:pre">        </span>ⲧ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER TAU<br class="">
                U+2CA9<span class="Apple-tab-span" style="white-space:pre">        </span>ⲩ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER UA<br class="">
                U+2CAB<span class="Apple-tab-span" style="white-space:pre">        </span>ⲫ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER FI<br class="">
                U+2CAD<span class="Apple-tab-span" style="white-space:pre">        </span>ⲭ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER KHI<br class="">
                U+2CAF<span class="Apple-tab-span" style="white-space:pre">        </span>ⲯ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER PSI<br class="">
                U+2CB1<span class="Apple-tab-span" style="white-space:pre">        </span>ⲱ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER OOU<br class="">
                U+2CC1<span class="Apple-tab-span" style="white-space:pre">        </span>ⳁ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SMALL LETTER SAMPI<br class="">
                U+2CE8<span class="Apple-tab-span" style="white-space:pre">        </span>⳨<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC
                SYMBOL TAU RO<br class="">
                <br class="">
                Only the few Coptic letters in the block U+03E2 to
                U+03EF are displayed<br class="">
                aright.<br class="">
                <br class="">
                It's no wonder that a search has so many spurious
                results if most of the<br class="">
                search space has been squashed into Unicode replacement
                characters.<br class="">
                <br class="">
                I'm a Windows user, as most of you know already.<br
                  class="">
                Does the same thing happen in Xiphos under Linux?<br
                  class="">
                <br class="">
                Is this an issue common to all SWORD based front-ends?<br
                  class="">
                The fact that we see similar results in PocketSword
                strongly suggests it is.<br class="">
                <br class="">
                Best regards,<br class="">
                <br class="">
                David<br class="">
                <br class="">
                <br class="">
                <br class="">
                --<br class="">
                View this message in context: <a
href="http://sword-dev.350566.n4.nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657106.html"
                  class="" moz-do-not-send="true">http://sword-dev.350566.n4.nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657106.html</a><br
                  class="">
                Sent from the SWORD Dev mailing list archive at <a
                  href="http://Nabble.com" class=""
                  moz-do-not-send="true">Nabble.com</a>.<br class="">
                <br class="">
                _______________________________________________<br
                  class="">
                sword-devel mailing list: <a
                  href="mailto:sword-devel@crosswire.org" class=""
                  moz-do-not-send="true">sword-devel@crosswire.org</a><br
                  class="">
                <a
                  href="http://www.crosswire.org/mailman/listinfo/sword-devel"
                  class="" moz-do-not-send="true">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br
                  class="">
                Instructions to unsubscribe/change your settings at
                above page</div>
            </div>
          </blockquote>
        </div>
        <br class="">
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
    </blockquote>
    <br>
  </body>
</html>