[sword-devel] VerseKey.java

Troy A. Griffitts sword-devel@crosswire.org
Wed, 26 Dec 2001 02:08:14 -0700


Bobby,

> long         *VerseKey::offsets[2][2]  = {{VerseKey::otbks,
> VerseKey::otcps}, {VerseKey::ntbks, VerseKey::ntcps}};

long *[][] is 3 dimensional (notice the *) just use long[][][]

The first 2 dimensions are constant; the last is different for each
entry.

Here's the history, and logic, behind this array...

A verse lookup goes thru 4 phases:

book
chapter
verse
data

Originally, we used 8 files for this 4 phase lookup (ot and nt where
separated):

ot.bks
ot.cps
ot.vss
ot
nt.bks
nt.cps
nt.vss
nt

[ START copied from end of example ]
It seems complicated, but is actually pretty simple:

look up in bks where book starts.
jump there + chapter in cps to find where chapter starts
jump there + verse in vss to find where actual data starts and size
jump there in ot or nt to get data.
[ END copied from end of example ]


Easier to explain tracing backward (we'll just use nt (ot is the same
for the Old Testament)):
____________

nt is the actual data file.  It contains the raw text of the module.
____________

nt.vss is an index file that contains a series of 6 byte records that
store u32 offset; u16 size;  so it might look something like:

offset	size
  0	 50
 50	120
170	 30
200	 45
...

the offset is the location in nt that the verse starts at, and the size
is the length of the data for that verse.

So if we want to read the data for the second verse in nt, we could do
something like:

// we want second verse from example data above
__u32 offset = 50;
__u16 size = 120

lseek(ntFd, offset, SEEK_SET);
read(ntFd, buf, size);

_______________

OK, nt.cps is an index into nt.vss.  It contains 4 byte records of a
single element u32 offset; and might look something like

offset
  0
180
372

these offset values are offsets in nt.vss pointing to where each CHAPTER
starts.  So, if we want to find verse 3 of CHAPTER 2 (we have some
special records besides 'verses' we call 'headers' , that reside at the
start of sections like chapter, book, testament, module.  We're ignoring
these right now to make the example simple), we would simply :

static const char VSSRECORDSIZE = 6;

// we want verse 2
int verse = 3;

// we want chapter 2 from example data above
__u32 chapterOffset = 180;

// this says go to where chapter starts and jump
// down a few verses to get the one we want
// - 1 on verse because we're don't need to jump
// down if we're the first one (i.e. 0-based)
__u32 chapVerseOffset = chapterOffset + (VSSRECORDSIZE * (verse - 1)

lseek(ntVssFd, chapterOffset, SEEK_SET);
read(ntVssFd, &offset, 4);
read(ntVssFd, &size, 2);

// the rest is from above **********************

__u32 offset;
__u16 size;

lseek(ntFd, offset, SEEK_SET);
read(ntFd, buf, size);

___________________

The last file, nt.bks, is an index into chapters pointing to where each
book starts.  4 byte records, one element u32 offset; just like nt.cps,
example:

offset
0
120
244

these offset values are offsets in nt.cps pointing to where each BOOK
starts.  So, if we want to find verse 3 of chapter 2 of BOOK 1 (again,
we're ignoring special header entries for book header, chapter header,
etc.), we would simply :

static const char CPSRECORDSIZE = 4;

// we want chapter 2
int chapter = 2;

// we want BOOK 1 from example data above
__u32 bookOffset = 180;
__u32 bookChapOffset = bookOffset + (CPSRECORDSIZE * (chapter - 1));
__u32 chapterOffset;

lseek(ntCpsFd, bookOffset, SEEK_SET);
read(ntCpsFd, &chapterOffset, 4);


// the rest is from above **********************

static const char VSSRECORDSIZE = 6;
int verse = 3;
__u32 offset;
__u16 size;

__u32 chapVerseOffset = chapterOffset + (VSSRECORDSIZE * (verse - 1));

lseek(ntVssFd, chapterOffset, SEEK_SET);
read(ntVssFd, &offset, 4);
read(ntVssFd, &size, 2);

lseek(ntFd, offset, SEEK_SET);
read(ntFd, buf, size);

___________________________________


It seems complicated, but is actually pretty simple:

look up in bks where book starts.
jump there + chapter in cps to find where chapter starts
jump there + verse in vss to find where actual data starts and size
jump there in ot or nt to get data.

________________________

Now, since you understand all that, offsets replaced 2 of the data
files: bks and cps.  we did this since most modules used KJV verse
numberings and the data in these files was identical.  When we go back
to dynamic versification schemes, we may just add these old index files
back, but I would rather create an Nth level index that doesn't know
about 'chapters', 'books', etc.  e.g. ot / nt concept would become
level1.idx; .bks would become level2.idx; .cps would become level3.idx;
.vss would become level4.idx, and ot / nt data files would become
something like book.dat

But this is all future stuff.  Those lookup tables would still exist,
possible, for speed, but they would be read in instead of loaded up in
canon.h

____________________________


Back to the point at hand.

long *offsets[2][2] are the u32 offsets from the 4 data files:
ot.bks
ot.cps
nt.bks
nt.cps

offsets[testament][bks = 0; cps = 1][where you want to go] = your offset
value;


Hope this helps.
	-Troy.



> 
> But, here's where I'm having trouble comprehending, everywhere that it's
> used, it's called as if it has three dimensions!  For example, lines
> 1240 and 1241 read.
> 
> offset = offsets[testament-1][0][book];
> offset = offsets[testament-1][1][(int)offset + chapter];
> 
> Now the code compiles and links correctly, but I don't understand why
> that is so?  Please help me to understand how a two dimensional array is
> allowed to be accessed as if it were three dimensional so that I can
> finish converting the versekey.cpp file to Java for JSword.
> 
> Thanks,
> 
> Bobby