[sword-devel] VerseKey.java

Joe Walker sword-devel@crosswire.org
26 Dec 2001 10:07:05 +0000


--=-NkVkUG8T6fRojPA+2c+R
Content-Type: text/plain
Content-Transfer-Encoding: 7bit


Hi,

I did some work on RawVerse.[cpp|java] a while ago, which I've attached
because I don't think jsword existed when I posted it last time. It
reads Sword format Bible data. I've added to this code to ProjectB so
that it can read Sword Bible data too.

Back to the VerseKey in hand - I understand VerseKey to be the thing
that parses "Gen 1 1" and understands that to be the first book of the
Bible. You may be interested in com.eireneh.bible.passage.Verse[Range]
which is the ProjectB code to do the same.
  http://prdownloads.sourceforge.net/projectb/projectb-0.86.zip

Joe.

On Wed, 2001-12-26 at 03:59, Bobby Nations wrote:
> Guys,
> 
> Ok, I'm having a bit of trouble understanding some of the C++ code in
> versekey.cpp.  On line 39, the variable 'offsets' is defined as a two
> dimensional array like so:
> 
> long         *VerseKey::offsets[2][2]  = {{VerseKey::otbks,
> VerseKey::otcps}, {VerseKey::ntbks, VerseKey::ntcps}};
> 
> But, here's where I'm having trouble comprehending, everywhere that it's
> used, it's called as if it has three dimensions!  For example, lines
> 1240 and 1241 read.
> 
> offset = offsets[testament-1][0][book];
> offset = offsets[testament-1][1][(int)offset + chapter];
> 
> Now the code compiles and links correctly, but I don't understand why
> that is so?  Please help me to understand how a two dimensional array is
> allowed to be accessed as if it were three dimensional so that I can
> finish converting the versekey.cpp file to Java for JSword.
> 
> Thanks,
> 
> Bobby
> 


--=-NkVkUG8T6fRojPA+2c+R
Content-Disposition: attachment; filename=RawVerse.java
Content-Transfer-Encoding: quoted-printable
Content-Type: text/x-java; charset=ISO-8859-1


package com.eireneh.bible.book.sword;

import java.io.*;

import com.eireneh.util.*;

/**
 * Code for class 'RawVerse'- a module that reads raw text files
 * ot and nt using indexs ??.bks ??.cps ??.vss and provides lookup and pars=
ing
 * functions based on class VerseKey
 */
public class RawVerse
{
    /** constant for the introduction */
    public static final int TESTAMENT_INTRO =3D 0;

    /** constant for the old testament */
    public static final int TESTAMENT_OLD =3D 1;

    /** constant for the new testament */
    public static final int TESTAMENT_NEW =3D 2;

    /**
     * RawVerse Constructor - Initializes data for instance of RawVerse
     * @param path - path of the directory where data and index files are l=
ocated.
     *		be sure to include the trailing separator (e.g. '/' or '\')
     *		(e.g. 'modules/texts/rawtext/webster/')
     */
    public RawVerse(String path) throws FileNotFoundException
    {
        idx_raf[TESTAMENT_OLD] =3D new RandomAccessFile(path + "ot.vss", "r=
");
        idx_raf[TESTAMENT_NEW] =3D new RandomAccessFile(path + "nt.vss", "r=
");
        txt_raf[TESTAMENT_OLD] =3D new RandomAccessFile(path + "ot", "r");
        txt_raf[TESTAMENT_NEW] =3D new RandomAccessFile(path + "nt", "r");

        // The original had a dtor that did the equiv of .close()ing the ab=
ove
        // I'm not sure that there is a delete type ability in Book.java an=
d
        // the finalizer for RandomAccessFile will do it anyway so for the
        // moment I'm going to ignore this.

        // The original also stored the path, but I don't think it ever use=
d it

        // The original also kept an instance count, which went unused (and=
 I
        // noticed in a few other places so it is either c&p or a pattern?
        // Either way the assumption that there is only one of a static is =
not
        // safe in many java environments (servlets, ejbs at least) so I've
        // deleted it
    }

    /**
     * Finds the offset of the key verse from the indexes
     * @param testament testament to find (0 - Bible/module introduction)
     * @param idxoff offset into .vss
     * @param start address to store the starting offset
     * @param size address to store the size of the entry
     */
    public Location findOffset(int testament, long idxoff) throws IOExcepti=
on
    {
        Location loc =3D new Location();

        // There was a bodge here to move testament around if someone wante=
d
        // to read the intro? We just have the set of static finals above
        //  if (testament =3D=3D 0)
        //      testament =3D idx_raf[1] =3D=3D null ? 1 : 2;

        // There was a test here to check ensure that is idx_raf[testament-=
1]
        // was null then we returned an default Location (of 0,0). However
        // This seems like papering over any errors so I have left it out f=
or
        // the time being

        // I've now totally re-written this because we did have byte-sex
        // problems. The file is little endian, and we read big endianly.

        // read the next 6 byes.
        idx_raf[testament].seek(idxoff*6);
        byte[] read =3D new byte[6];
        idx_raf[testament].readFully(read);
        int[] temp =3D new int[6];

        for (int i=3D0; i<temp.length; i++)
        {
            temp[i] =3D read[i] >=3D 0 ? read[i] : 256 + read[i];
            log.fine("temp["+i+"]=3D"+temp[i]);
        }

        loc.start =3D (temp[3] << 24) | (temp[2] << 16) | (temp[1] << 8) | =
temp[0];
        loc.size =3D (temp[5] << 8) | temp[4];

        // the original lseek used SEEK_SET. This is the only option in Jav=
a
        // The *6 is because we use 4 bytes for the offset, and 2 for the l=
ength
        // There used to be some code at the start of the method like:
        //   idxoff *=3D 6;
        // But itn't good to alter parameters and here is the only place th=
at
        // it is used.

        // There was some BIGENDIAN swapping stuff here. To be honest I
        // can't be bothered to think about whether or not this is needed
        // right now.
        // *start =3D lelong(*start);
        // *size  =3D leshort(*size);

        // There was also some code here to patch over any errors if you
        // could only read one of the 2 bytes from above. I'm not sure that
        // that is a good idea, so I've left it out.

        return loc;
    }

    /**
     * Gets text at a given offset.
     * @param testament testament file to search in (0 - Old; 1 - New)
     * @param loc Where to read from
     */
    public String getText(int testament, Location loc) throws IOException
    {
        // The original had the size param as an unsigned short.
        // It also used SEEK_SET as above (default in Java)

        byte[] buffer =3D new byte[loc.size];

        txt_raf[testament].seek(loc.start);
        txt_raf[testament].read(buffer);

        // We should probably think about encodings here?
        return new String(buffer);
    }

    /**
     * Prepares the text before returning it to external objects
     * @param buf buffer where text is stored and where to store the prep'd=
 text
     */
    protected String prepText(String text)
    {
        StringBuffer buf =3D new StringBuffer(text);

        boolean space =3D false;
        boolean cr =3D false;
        boolean realdata =3D false;
        char nlcnt =3D 0;

        int to =3D 0;
        for (int from=3D0; from<buf.length(); from++)
        {
            switch (buf.charAt(from))
            {
            case 10:
                if (!realdata)
                    continue;

                space =3D (cr) ? false : true;
                cr =3D false;
                nlcnt++;
                if (nlcnt > 1)
                {
                    // buf.setCharAt(to++, nl);
                    buf.setCharAt(to++, '\n');
                    // nlcnt =3D 0;
                }
                continue;

            case 13:
                if (!realdata)
                    continue;

                buf.setCharAt(to++, '\n');
                space =3D false;
                cr =3D true;
                continue;
            }

            realdata =3D true;
            nlcnt =3D 0;

            if (space)
            {
                space =3D false;
                if (buf.charAt(from) !=3D ' ')
                {
                    buf.setCharAt(to++, ' ');
                    from--;
                    continue;
                }
            }
            buf.setCharAt(to++, buf.charAt(from));
        }

        // This next line just ensured that we were null terminated.
        //   buf.setCharAt(to, '\0');

        // There followed a lot of code that stomed \o to the end of the
        // string if there was whitespace there. trim() is easier.

        return buf.toString().trim();
    }

    /**
     * Sets text for current offset
     * @param testament testament to find (0 - Bible/module introduction)
     * @param idxoff offset into .vss
     * @param buf buffer to store
     */
    protected void setText(int testament, long idxoff, String buf) throws I=
OException
    {
        // As in getText() we don't alter the formal parameter
        //   idxoff *=3D 6;

        // As in getText() There was some messing around with testament
        //  if (testament =3D=3D 0)
        //      testament =3D idx_raf[1] =3D=3D null ? 1 : 2;

        // outsize started off being unsigned
        // and it looks like "unsigned short size;" is not used
        short outsize =3D (short) buf.length();

        // There was some more BIGENDIAN nonsense here. Again ignoring the
        // MACOSX bits it looked like:
        //   start =3D lelong(start);
        //   outsize  =3D leshort(size);
        // I've also moved things around very slightly, the endian bits cam=
e
        // just before the writeShort();

        idx_raf[testament].seek(idxoff*6);
        long start =3D idx_raf[testament].readLong();
        idx_raf[testament].writeShort(outsize);

        // There is some encoding stuff to be thought about here
        byte[] data =3D buf.getBytes();

        txt_raf[testament].seek(start);
        txt_raf[testament].write(data);
    }

    /**
     * Creates new module files
     * @param path Directory to store module files
     */
    public static void createModule(String path) throws IOException
    {
        truncate(path + "ot.vss");
        truncate(path + "nt.vss");
        truncate(path + "ot");
        truncate(path + "nt");

        // I'm not at all sure what these did. I'd guess they wrote data to
        // the files we just created? But how they'd neatly (or otherwise) =
go
        // about this is beyond me right now.
        //   RawVerse rv(path);
        //   VerseKey mykey("Rev 22:21");
    }

    /**
     * Create an empty file, deleting what was there
     */
    private static void truncate(String filename) throws IOException
    {
        // The original code did something like this. I recon this basicall=
y
        // deleted and recreated (empty) the named file.
        //   unlink(buf);
        //   fd =3D FileMgr::systemFileMgr.open(buf, O_CREAT|O_WRONLY|O_BIN=
ARY, S_IREAD|S_IWRITE);
        //   FileMgr::systemFileMgr.close(fd);

        File file =3D new File(filename);

        file.delete();
        file.createNewFile();
    }

    /**
     * There has to be a better method than this. findoffset() returned a s=
tart
     * and and offset, and multiple return values are not possible in Java.
     * It seems to me that returning start and size from a public i/f repre=
sents
     * showing our callers more than we should and I expect that the soluti=
on
     * lies in a thorough sorting out if the interface, but I want to keep
     * the methods unchanged as reasonable right now.
     */
    public class Location
    {
        /** Where does the data start */
        public long start =3D 0;

        /** The data length. Is short long enough? the original was unsigne=
d short */
        public int size =3D 0;

        /**
         * Debug only
         */
        public String toString()
        {
            return "start=3D"+start+", size=3D"+size;
        }
    }

    /**
     * A test program
     */
    public static void main(String[] args)
    {
        try
        {
            // To start with I'm going to hard code the path
            String path =3D "/usr/apps/sword/modules/texts/rawtext/kjv/";

            RawVerse verse =3D new RawVerse(path);
            Location loc =3D verse.findOffset(RawVerse.TESTAMENT_NEW, 6);
            String pre =3D verse.getText(RawVerse.TESTAMENT_NEW, loc);

            log.fine("loc=3D"+loc);
            log.fine("pre=3D"+pre);
            log.fine("post=3D"+verse.prepText(pre));
        }
        catch (Exception ex)
        {
            log.log(Level.INFO, "Failure", ex);
        }
    }

    /** The array of index files */
    private RandomAccessFile[] idx_raf =3D new RandomAccessFile[3];

    /** The array of data files */
    private RandomAccessFile[] txt_raf =3D new RandomAccessFile[3];

    /** The log stream */
    protected static Logger log =3D Logger.getLogger("bible.book");
}


--=-NkVkUG8T6fRojPA+2c+R--