JSword
  1. JSword
  2. JS-229

Stackoverflow when indexing modules

    Details

    • Type: Bug Bug
    • Status: Closed (View Workflow)
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7
    • Component/s: None
    • Labels:
      None

      Description

      I'm getting a stackoverflow in JSword on this line. Any ideas why that would be?

      org.crosswire.jsword.index.lucene.LuceneIndex.generateSearchIndexImpl(Progress, List<Key>, IndexWriter, Key, int)

        Issue Links

          Activity

          Hide
          DM Smith added a comment -

          Could it be the recursion in the routine? What is the nature of the key?
          A verse or a verse list should be straightforward with no recursion.

          int size = key.getCardinality();
          int subCount = count;
          for (Key subkey : key) {
          if (subkey.canHaveChildren())

          { generateSearchIndexImpl(job, errors, writer, subkey, subCount); }

          else

          { data = new BookData(book, subkey); osis = null; .... }

          }
          }

          Show
          DM Smith added a comment - Could it be the recursion in the routine? What is the nature of the key? A verse or a verse list should be straightforward with no recursion. int size = key.getCardinality(); int subCount = count; for (Key subkey : key) { if (subkey.canHaveChildren()) { generateSearchIndexImpl(job, errors, writer, subkey, subCount); } else { data = new BookData(book, subkey); osis = null; .... } } }
          Hide
          DM Smith added a comment -

          I've been testing the indexing of all the Bible modules I can get my hands on. I'm seeing deadlocks, too many file handles open, indexing finishes but there's no index, ....

          I haven't seen this one but I'm going to track this issue as to successfully build an index of all these modules.

          Show
          DM Smith added a comment - I've been testing the indexing of all the Bible modules I can get my hands on. I'm seeing deadlocks, too many file handles open, indexing finishes but there's no index, .... I haven't seen this one but I'm going to track this issue as to successfully build an index of all these modules.
          Hide
          Martin Denham added a comment -

          generateSearchIndex does not run at all on many Android phones because it uses too much ram. I spent a lot of time getting it to work on Android, from memory the largest change was to stop using RAMDirectory, but there were a lot of other smaller changes. I created a PdaLuceneIndexCreator but would be very happy if the code somehow migrated back into JSword because I keep having to resynch the code with JSword. I think I should create a separate issue for this.

          Show
          Martin Denham added a comment - generateSearchIndex does not run at all on many Android phones because it uses too much ram. I spent a lot of time getting it to work on Android, from memory the largest change was to stop using RAMDirectory, but there were a lot of other smaller changes. I created a PdaLuceneIndexCreator but would be very happy if the code somehow migrated back into JSword because I keep having to resynch the code with JSword. I think I should create a separate issue for this.
          Hide
          DM Smith added a comment -

          Please do open an issue for it.

          The reason for the ram dir was that it took 40+ minutes to create the index for the KJV on Windows. This was due to fast indexing and virus scanning. Turning those off made it 10 times faster. Didn't think it was right to explain to folks how to turn these off. The problem was that Lucene was creating one file per verse and merging those gradually into the final index, deleting them when it was done. Windows and Norton were trying to scan the files upon creation and then Windows had to undo it's work when the file was deleted.

          Putting it into ram made it about 20 x faster. Got it down to 2 minutes. That was with Win 98SE on a laptop w/ <400M RAM.

          I'd be glad to have two methods in the code that you can pick from or just yours if there is no longer a problem on Windows.

          Show
          DM Smith added a comment - Please do open an issue for it. The reason for the ram dir was that it took 40+ minutes to create the index for the KJV on Windows. This was due to fast indexing and virus scanning. Turning those off made it 10 times faster. Didn't think it was right to explain to folks how to turn these off. The problem was that Lucene was creating one file per verse and merging those gradually into the final index, deleting them when it was done. Windows and Norton were trying to scan the files upon creation and then Windows had to undo it's work when the file was deleted. Putting it into ram made it about 20 x faster. Got it down to 2 minutes. That was with Win 98SE on a laptop w/ <400M RAM. I'd be glad to have two methods in the code that you can pick from or just yours if there is no longer a problem on Windows.
          Hide
          DM Smith added a comment -

          The problem of indexing finishing but no index has been solved. It was using a global key list that when storing Matt 1.1 actually stored Gen 1.1. It needed to take the testament into account when understanding an ordinal value from the NT.

          I think the deadlock problem is finally solved. It was a race condition between 2 (or more) Progress Meters trying to look at the job state.

          I've tried to build several hundred module indexes and haven't seen a problem (other than module data problems). So I'm going to say this is done. If there is a specific problem, please open a new issue.

          Martin, please do open an issue for building w/o a RAM dir.

          Show
          DM Smith added a comment - The problem of indexing finishing but no index has been solved. It was using a global key list that when storing Matt 1.1 actually stored Gen 1.1. It needed to take the testament into account when understanding an ordinal value from the NT. I think the deadlock problem is finally solved. It was a race condition between 2 (or more) Progress Meters trying to look at the job state. I've tried to build several hundred module indexes and haven't seen a problem (other than module data problems). So I'm going to say this is done. If there is a specific problem, please open a new issue. Martin, please do open an issue for building w/o a RAM dir.

            People

            • Assignee:
              DM Smith
              Reporter:
              Chris Burrell
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: