[sword-devel] task

Troy A. Griffitts sword-devel@crosswire.org
Sun, 09 Sep 2001 13:56:45 -0700


Bobby,
	To give part of the long story...

	Compression for lexicon / dictionary modules will use this class.

The index scheme in mind, if I can remember correctly, is:

1st index file:

[6 byte record]
4 - offset to key
2 - size of key

alphebetized key index file:
[variable byte record]
[ 6 byte suffix ]
key \n
4 - block number
2 - record number in block

block index file:
[8 byte record]
4 - offset to block
4 - sizeof block

block datafile:
[variable length compressed blocks ]
uncompressed, a record contains:
meta info about each record in it's block
(where each record resides and size)
actual records (entries)


getRawData return a const char * which mandates that no other class
modify the data, so we're safe in that regard; it will typically go from
your EntriesBlock class straight into the block data file.  And, when
needed, we'll construct an EntriesBlock from a block data buffer
straight from the block data file.  Does that make sense?  The
EntriesBlock class will be used in 2 scenerios:

 o	creating the module's block datafile, and 
 o	caching a block of the datafile in memory and providing
		entries as requested.


As far as the remove method, it shouldn't matter how, internally, the
offset of the entries are ordered in the dataspace, as long as entry (1)
always gives me the same buffer.  We can go 2 ways with the
entry(number):
 o	remove pushes all entry numbers preceding up one, or
 o	remove nulls out the meta info for that entry and leaves
		the other entries' numbers untouched

the former keeps things a little cleaner in the datafile, but makes me
rearrange and entire parent index section upon delete.

the latter lends itself to easy management of the index, but will leave
metadata gaps.  maybe on addEntry, we could scan for available entry
numbers, and reuse if they exist.

Thanks again for your time!  It's awesome to have someone to run things
by and share with in the implementation.

	Looking forward to your comments,
		-Troy.




> OK, I'll take the STL out of the API and go back to the more c-like interface.  Before I spend too much
> time providing the getRawData function, please help me to understand your design for it in a little more
> detail.
> 
> - What other classes will manipulate the buffer either by writing to it or reading from it?  I know that
> currently none do, but you have something in mind for the future.
> 
> - How important is it for the buffer to be current at all times?  That is, will the consumer classes store
> the pointer returned from getRawData and then use it at their leisure, or will they need to call
> getRawData every time they intend to use the buffer.  If they are going to be using the buffer at any time
> they desire, there's going to need to be some more thought put into the design so as to ensure that
> changes to the buffer don't affect anyone who's trying to read it at the same time.  I'm thinking of the
> realloc calls especially because if another class has stored the pointer to the buffer, and then we
> realloc that buffer to a new spot in memory ... well, won't things get a little hairy when the other class
> tries to do anything with the old pointer?
> 
> - How do you intend to publish the internal structure of the buffer to the consumer classes?  It's a bit
> more complicated than a simple structure variable, but there's got to be a way to let other class writers
> (and maintainers) know how to read the buffer when the time comes for them to do so.  Extensive
> documentation included in the header file would certainly be a good place to start, but long term wouldn't
> it be good to have something more concrete.
> 
> Finally, how do you want to harvest the deleted entries?  Move everything to the right back one and drop
> the last entry and shrink the buffer, or move the last entry to the deleted entry and resize the buffer,
> or either of the first two without shrinking the buffer.
> 
> Cheers,
> 
> Bobby
> 
> "Troy A. Griffitts" wrote:
> 
> > Bobby,
> >         I've been trying to keep STL out of the API interface.  It really makes
> > things tough for language bindings and other things.  I don't mind it
> > being used behind the scenes, and there are some places that do expose
> > STL components as part of their interface, but I'm hoping to remove
> > those also, or at least provide a alternative non-STL mechanism for
> > accessing the same functionality.  Passing a const char * is much more
> > efficient and specificly descriptive than pass a string.
> >
> > The EntriesBlock class is a utility for manipulating a cached
> > cross-section of an index/datafile.  It's purpose IS to manipulate the
> > buffer, not to simply hold records.  This class is probably not the best
> > candidate for re-writing.  I realize the current implementation may not
> > be the most obvious, but trust me, there is a purpose.
> >
> >                 -Troy.
> >
> > Bobby Nations wrote:
> > >
> > > Troy,
> > >
> > > I had a nagging suspicion that leaving that function out would come back to haunt me.  It's easy
> > > enough to put it back in, and I'll do that tomorrow (or Monday).  One function that returns a
> > > formatted block of memory coming right up.
> > >
> > > I'm curious, though, about why you would want to make your internals available to other classes that
> > > way.  After all, if you're giving the EntriesBlock class the responsibility of manipulating the raw
> > > buffer, shouldn't it be the only class that manipulates it?  Once you return the pointer into the
> > > buffer, all bets are off.
> > >
> > > Maybe it would help if I had a better understanding of how you intend to use this class?  I've been
> > > unable to find any other cpp file that makes use of the EntriesBlock class, which leads me to
> > > believe that it's still in the design stage to some extent.  If I knew more about it's consumers,
> > > then I would be able to do a better job with the rewrite.
> > >
> > > What are your opinions about using the STL constructs?  Do you need the functions that use char
> > > pointers because I can easily add them back in again.  I just did a one for one replacement of the
> > > char* with string.
> > >
> > > Also, please give some guidance about how to rearrange the entries in the block when one is deleted
> > > from the middle.  I took the long route of moving all entries to the right of the deletion back one
> > > and removing the last entry, but an even quicker approach would be just to move the last entry into
> > > the empty spot.  That, however, would wreck any sorting of the entries that might have been done,
> > > which is why I didn't do it that way.
> > >
> > > Thanks,
> > >
> > > Bobby
> > >
> > > "Troy A. Griffitts" wrote:
> > >
> > > > Bobby,
> > > >         Thanks for the attempt.  Unfortunately, the purpose of the class is to
> > > > modify a raw buffer that can be retrieved with the:
> > > >
> > > >         const char *getRawData(long *size);
> > > >
> > > > method.  You have changed the interface leaving this public method out.
> > > >
> > > > Your rewrite meets the purposes of the test program, but my purpose was
> > > > really to obtain the raw buffer that it is EntriesBlock's job to
> > > > manipulate.
> > > >
> > > >         I do appreciate the attempt.
> > > >
> > > >                 -Troy.
> > > >
> > > > Bobby Nations wrote:
> > > > >
> > > > > Troy,
> > > > >
> > > > > I took a shot at it, but it may not be what you're expecting.  See, I just
> > > > > rewrote it more or less completely.  Gone are the calloc's and memcopy's.
> > > > > Instead, I've written it to use the STL vector container.  I can do it the
> > > > > other way as well, but this is much cleaner to look at.
> > > > >
> > > > > Is there a reason why you didn't use the STL in the first place?
> > > > >
> > > > > Also, I modified the test program slightly since the original never called
> > > > > the removeEntry function.
> > > > >
> > > > > I'm now able to add and entry, print it out, and then delete it via the
> > > > > testblocks program.  Just curious, but it sure would be helpful for us
> > > > > newbies if there were a set of unit tests (something like CppUnit would
> > > > > work nicely).
> > > > >
> > > > > Anyway, I've attached the changed files for your perusal.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Bobby Nations
> > > > >
> > > > > "Troy A. Griffitts" wrote:
> > > > >
> > > > > > Ok, here's some things that I've been meaning to get around to that I
> > > > > > think might be a good, isolated place for people to begin getting
> > > > > > involved in.
> > > > > >
> > > > > > There is a new class I started and is almost complete, but still has a
> > > > > > few bugs.  It will be used in compression of lexicons and dictionaries,
> > > > > > but that isn't necessary knowledge to complete the class.
> > > > > >
> > > > > > The class can be found at sword/src/modules/common/entriesblk.cpp
> > > > > >
> > > > > > The test program for this class is at sword/tests/testblocks.cpp
> > > > > >
> > > > > > It's been a while since I looked at the code.  I just ran the test
> > > > > > program.  The remove option doesn't seem to work.
> > > > > >
> > > > > > Anyone interested in completing this class, review the code and is you
> > > > > > STILL what to complete this class :), please speak up!
> > > > > >
> > > > > >         Thanks,
> > > > > >                 -Troy.
> > > > >
> > > > >   ------------------------------------------------------------------------
> > > > > #include <entriesblk.h>
> > > > > #include <iostream.h>
> > > > > #include <string>
> > > > > #include <stdio.h>
> > > > >
> > > > > void addEntry(EntriesBlock *eb) {
> > > > >         string input;
> > > > >         string body;
> > > > >         char line[1024];
> > > > >         cout << "\nEnter new Entry's text. '.' on an empty line to finish:\n";
> > > > >         do {
> > > > >                 cout << "> ";
> > > > >                 gets(line);
> > > > >                 input = line;
> > > > >                 if (input.compare("."))
> > > > >                         body.append(input);
> > > > >         }
> > > > >         while (input.compare("."));
> > > > >         cout << "Adding new entry.  Index is: " << eb->addEntry(body.c_str()) << "\n\n";
> > > > > }
> > > > >
> > > > > void printEntry(EntriesBlock *eb, int index) {
> > > > >         if (index < eb->getCount()) {
> > > > >                 cout << "Contents of entry [" << index << "]:\n";
> > > > >                 cout << eb->getEntry(index) << "\n";
> > > > >         }
> > > > >         else cout << "Invalid entry number\n\n";
> > > > > }
> > > > >
> > > > > void removeEntry(EntriesBlock *eb, int index) {
> > > > >         if (index < eb->getCount()) {
> > > > >                 cout << "Removing entry [" << index << "]\n";
> > > > >                 eb->removeEntry( index );
> > > > >         }
> > > > >         else cout << "Invalid entry number\n\n";
> > > > > }
> > > > >
> > > > > int main(int argc, char **argv) {
> > > > >
> > > > >         EntriesBlock *eb = new EntriesBlock();
> > > > >         string input;
> > > > >         char line[1024];
> > > > >
> > > > >         cout << "Initial entry count should be 0: " << eb->getCount() << "\n";
> > > > >
> > > > >         do {
> > > > >                 cout << "[" << eb->getCount() << "] > ";
> > > > >                 gets(line);
> > > > >                 input = line;
> > > > >                 if (input.length() > 0) {
> > > > >                         switch (input[0]) {
> > > > >                                 case 'a': addEntry(eb); break;
> > > > >                                 case 'p':       printEntry(eb, atoi(input.c_str()+1)); break;
> > > > >                                 case 'r':       removeEntry(eb, atoi(input.c_str()+1)); break;
> > > > >                                 case 'q': break;
> > > > >                                 case '?':
> > > > >                                 default:
> > > > >                                         cout << "\n a - add a new entry\n";
> > > > >                                         cout << " p <entry_index> - print entry\n";
> > > > >                                         cout << " r <entry_index> - remove entry\n";
> > > > >                                         cout << " q - quit\n\n";
> > > > >                                         break;
> > > > >                         }
> > > > >                 }
> > > > >         }
> > > > >         while (input.compare("q"));
> > > > >
> > > > >         delete eb;
> > > > >
> > > > >         return 0;
> > > > > }
> > > > >
> > > > >   ------------------------------------------------------------------------
> > > > > #include <entriesblk.h>
> > > > > #include <stdlib.h>
> > > > > #include <string.h>
> > > > >
> > > > > //
> > > > > // Constructors
> > > > > //
> > > > >
> > > > > //
> > > > > // Destructor
> > > > > //
> > > > >
> > > > > //
> > > > > // Accessors
> > > > > //
> > > > > int EntriesBlock::getCount() { return m_entries.size(); }
> > > > > string EntriesBlock::getEntry( int index ) { return m_entries[ index ]; }
> > > > >
> > > > > //
> > > > > // Mutators
> > > > > //
> > > > > int EntriesBlock::addEntry( string str )
> > > > > {
> > > > >     m_entries.push_back( str );
> > > > >     return m_entries.size();
> > > > > }
> > > > >
> > > > > void EntriesBlock::removeEntry( int index )
> > > > > {
> > > > >     // Shift everything back over the entrie to be deleted
> > > > >     int last = m_entries.size() - 1;
> > > > >     for ( int i=index; i<last; i++ )
> > > > >     {
> > > > >         m_entries[ i ] = m_entries[ i+1 ];
> > > > >     }
> > > > >     // Remove the empty entry at the end
> > > > >     m_entries.pop_back();
> > > > > }
> > > > >
> > > > >   ------------------------------------------------------------------------
> > > > > #ifndef ENTRIESBLK_H
> > > > > #define ENTRIESBLK_H
> > > > >
> > > > > #include <vector>
> > > > > #include <string>
> > > > >
> > > > > class EntriesBlock
> > > > > {
> > > > >     public:
> > > > >         //
> > > > >         // Constructors & Destructor
> > > > >         //
> > > > >
> > > > >         //
> > > > >         // Accessors
> > > > >         //
> > > > >         int getCount();
> > > > >         string getEntry( int index );
> > > > >
> > > > >         //
> > > > >         // Mutators
> > > > >         //
> > > > >         int addEntry( string entry );
> > > > >         void removeEntry( int index );
> > > > >
> > > > >     private:
> > > > >         //
> > > > >         // Member variables
> > > > >         //
> > > > >         vector< string > m_entries;
> > > > > };
> > > > >
> > > > > #endif