JSword
  1. JSword
  2. JS-226

Robinson's morphology is not indexed in JSword modules

    Details

    • Type: Bug Bug
    • Status: Reopened (View Workflow)
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.6
    • Fix Version/s: 1.7
    • Component/s: o.c.jsword.index
    • Labels:
      None

      Description

      Lucene is not told to index the morphology information rendering such searches impossible.

      1. Encoding.doc
        59 kB
        David Instone-Brewer
      2. Encoding.doc
        65 kB
        David Instone-Brewer

        Activity

        Hide
        DM Smith added a comment -

        Regarding using regular expressions to do a search:
        Lucene search syntax is not regular expression. It is more like unix command-line globbing. I haven't seen regular expression support in a contrib to Lucene, but that doesn't mean it is not there.

        But if not, to support regular expressions, we'll need to intercept the query and pick out the regular expression and use the regular expression to do our own search over our own store or the term dictionary.

        Show
        DM Smith added a comment - Regarding using regular expressions to do a search: Lucene search syntax is not regular expression. It is more like unix command-line globbing. I haven't seen regular expression support in a contrib to Lucene, but that doesn't mean it is not there. But if not, to support regular expressions, we'll need to intercept the query and pick out the regular expression and use the regular expression to do our own search over our own store or the term dictionary.
        Hide
        David Instone-Brewer added a comment -

        The RegEx expressions were more complicated than I had thought they would be.
        Is it time to redesign the Robinson Codes?
        They aren't particularly human-friendly or machine-friendly
        I think the latter is more important because ideally people won't see the actual coding.

        Show
        David Instone-Brewer added a comment - The RegEx expressions were more complicated than I had thought they would be. Is it time to redesign the Robinson Codes? They aren't particularly human-friendly or machine-friendly I think the latter is more important because ideally people won't see the actual coding.
        Hide
        Chris Burrell added a comment -

        Agreed - showing the codes to the user, should be a last resort thing, as it implies that they need to learn the new system.

        Show
        Chris Burrell added a comment - Agreed - showing the codes to the user, should be a last resort thing, as it implies that they need to learn the new system.
        Show
        Chris Burrell added a comment - It seems Lucene has some support for Regular Expressions anyway: http://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/api/contrib-regex/org/apache/lucene/search/regex/package-summary.html
        Hide
        DM Smith added a comment -

        If we can create a mapping for Robinson codes to something that is better (human readable and easy to search), then we can use the mapping w/in JSword to provide a better user experience.

        Basic thought, the user would see the new codes or a decoding of these codes into their language (or the default, if there's no such translation). They can search these codes either directly or via a wizard (what is done would be a front-end choice).

        It may be that the underlying module uses the old codes. That'd be ok. Not ideal. The search would reverse the mapping going from the new codes to the old codes and use that to search the module. Likewise, when presenting the module, the old codes would be replace with the new codes. This would be a process of normalization, which we do currently for Strong's numbers.

        We may want to explore the idea of a module sidecar. On various occasions, I've wanted finer grain information regarding a module. Basically, we'd maintain a separate conf for the modules. It'd contain information regarding thing like: user provided font info, unlock keys, type of Strong's numbers per testament, type of morphology per testament, .... Any program can set a value into the sidecar. This info would be read into BookMetadata and would be available for all programs. If a program doesn't know what to do with it, it'd ignore it. It would be good to communicate and document these new values. Automatic behavior that's added to JSword would need to be discussed.

        Show
        DM Smith added a comment - If we can create a mapping for Robinson codes to something that is better (human readable and easy to search), then we can use the mapping w/in JSword to provide a better user experience. Basic thought, the user would see the new codes or a decoding of these codes into their language (or the default, if there's no such translation). They can search these codes either directly or via a wizard (what is done would be a front-end choice). It may be that the underlying module uses the old codes. That'd be ok. Not ideal. The search would reverse the mapping going from the new codes to the old codes and use that to search the module. Likewise, when presenting the module, the old codes would be replace with the new codes. This would be a process of normalization, which we do currently for Strong's numbers. We may want to explore the idea of a module sidecar. On various occasions, I've wanted finer grain information regarding a module. Basically, we'd maintain a separate conf for the modules. It'd contain information regarding thing like: user provided font info, unlock keys, type of Strong's numbers per testament, type of morphology per testament , .... Any program can set a value into the sidecar. This info would be read into BookMetadata and would be available for all programs. If a program doesn't know what to do with it, it'd ignore it. It would be good to communicate and document these new values. Automatic behavior that's added to JSword would need to be discussed.

          People

          • Assignee:
            DM Smith
            Reporter:
            Chris Burrell
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated: