[sword-devel] Some Comments on Sword's Code

12 Feb 2002 14:00:16 -0700

David,
	Thank you truely for taking the time to review our code.  I appreciate
your heart to help.  My comments below.

On Mon, 2002-02-11 at 04:07, David White wrote:
> While I was looking through the searching code for Sword, I noticed a
> few things which seemed little strange, and I thought I'd make comments
> on ways in which they could be improved:
> 
> there is lots of code in destructors that looks like this:
> 
> 	if (entrybuf)
> 		delete [] entrybuf;
> 
> this is unnecessary, since C++ guarantees that delete [] NULL; will have
> no effect.

Yes, this is left over from my C days.  I understand that most C++
compilers will catch a delete [] 0, for me, but I don't necessarily
think it's bad practice to check myself-- especially when I know C does
not have to handle this for me.

> Further, there seems to be a large amount of unnecessary memory
> allocation and deallocation. For example, entrybuf is allocated as new
> char[1]; - and as far as I can see it exists for the life of the object,
> and then is deleted in the destructor; why not just use char
> entrybuf[1]; ? - it is more efficient, more maintainable and more
> reliable.

I'm assuming that you mean entrybuf in our data drivers.  This variable
is used to hold the current entry buffer and will be reallocated to the
appropriate size for each entry.

> 
> Instead of using constructor initialiser lists, variables are
> initialised inside the body of the constructor; it is generally
> considered better to initialise variables properly.

I don't believe there is any 'properly' dubbed way to initialize
variables.  To defend my preference to use the body, when we overload
constructors, we usually move the initialization to a void init() method
so all constructors use the same initialization.  It's much easier to
move the initialization the way the code stands currently.  I don't like
redundantly initializing variables in each constructor's signature, and
don't feel this was the purpose of this mechanism.  I believe it is
primarily intended to give a place to specify parameters to complex
object types that get created upon construction of an object.  We mostly
have POINTERS to complex objects.  This means we don't need to
initialize a bunch of complex objects until they are needed.

> The search function, takes a parameter "int searchType", which is
> documented to be a regex search if search is >= 0, a phrase if search is
> -1, and a multiword search if phrase is -2. It would be alot cleaner,
> and typesafe just to use an enumeration:
> 
> enum SearchType { REGEX_SEARCH, MULTIWORD_SEARCH, PHRASE_SEARCH };

Good suggestion, thank you.  The history of the signature mandated this
construct.  The first parameter used to be REGEX_ params.  These were
all >= 0  When we added a new parameter to the method, we kept backward
compatability by making the 2 new search types <0.  You may still pass
REGEX_ params in this param, but it has been some time since this change
and we should probably move to typed parameter like your enum.

> Search also takes a pointer to a function, "void (*percent)(char, void
> *)". It would be more flexible to make the function templated, and make
> percent a function or function-like object that is passed in. That way,
> the percentUserData parameter would also be unnecessary, as the user
> could pass in a functoid which stores data if that is the desired
> behaviour.

IMO, most people would not be able to use this complex mechanism, and is
overkill.

> There is alot of code that could be written better using the STL, for
> example:
> 
> for (unsigned int i = 0; i < strlen(wordBuf); i++)
> 	wordBuf[i] = toupper(wordBuf[i]);
> 

Yes, strlen should not be in the condition of the for loop.  I did find
this construct in a few places in our library.

The rest of your message seems to suggest using STL throughout the
engine.  Although I am not against using STL in places where it is
obviously beneficial, I don't agree that it is mostly more efficient. 
Here is a simple example.  Compile these 2 programs:

yoyo.cpp:
_____________
#include <string.h>

int main(int argc, char **argv) {
     const char *sample[] = {"1", "12", "123", "1234", "12345",
"123456", "1234567", "12345678", "123456789", "1234567890"};
     for (long i = 0; i < 9999999L; i++) {
          char *stuff = 0;
          stuff = new char [ strlen(sample[i%10]) ];
          strcpy(stuff, sample[i%10]);
          stuff;    // access buffer
          delete [] stuff;
     }
     return 0;
}

______________________

yoyo2.cpp:
______________
#include <string>

int main(int argc, char **argv) {
     const char *sample[] = {"1", "12", "123", "1234", "12345",
"123456", "1234567", "12345678", "123456789", "1234567890"};

     for (long i = 0; i < 9999999L; i++) {
          string stuff;
          stuff = sample[i%10];
          stuff.c_str(); // access buffer
     }
     return 0;
}

On my system, without optimization, yoyo runs 2x faster.  With
optimization yoyo2 runs 1.2x faster.  I was suprised that yoyo2 ran
faster with optimization.  This gives me a much better impression of
STL.  Either case, stripped binaries of yoyo2 are 2x larger.  This does
not take into account passing complex objects on the stack.  In an
engine used by many frontends, I feel speed and size are more important.
Not using STL in many cases gives us the opportunity to optimize code
rather than being at the mercy of the compiler's optimization.

Having said this, we still use STL in many cases, and in some cases I
even break my own rules about not exposing STL in the interface (e.g.
SWConfig, SWMgr).  I believe there are many more cases that it is not
appropriate.

The code you referenced in rawtext.cpp is my example plugin for the fast
searching framework.  It was a hack and was only intended to show how to
plug in a faster search algorythm.  It was not intended to be used in
'production', but you had no idea of this, and your comment of using stl
is received.

The rest of your suggestions below are valuable, also.  Thank you.  I
would love to have someone with your knowledge of C++ to be contributing
code.  A 9 year evolution of the API has produced inconsistent and
confusing code in some areas.  After 1.5.4, we plan to begin 1.99.x,
which will include a rewrite of much of the API looking towards a 2.0.0
release.  Please consider being involved in that process.

	-Troy.

> There are numerous calls to C-style allocation functions (calloc and
> realloc), I would suggest considering using the more typesafe and
> powerful std::vector. In particular, it is very difficult to make
> C-style allocated code exception-safe, and if you really have to use it,
> I would suggest using an auto_array class, in the spirit of
> std::auto_ptr. Also, a problem with C-style allocation, is it makes it
> even more difficult to deallocate memory properly, since you have to be
> careful whether to use free() or delete or delete [], corresponding to
> whether the memory was allocated using malloc/calloc/realloc, new, or
> new []. Worse, on many compilers, allocating with new [] and freeing
> with free() is fine, but only when you port to another platform will you
> find lots of weird bugs.

> 
> I also noticed lines that looked like this:
> 
> indexes.erase(indexes.begin(), indexes.end());
> 
> this should be: indexes.clear();
> 
> I also noticed, that other than dynamic casts, Sword seems to
> exclusively use C-style casts. I would recommend not using C-style
> casts, as they are more dangerous, and might do an unexpected const_cast
> or reinterpret_cast - both of which can easily cause undefined
> behaviour. Generally, try to use static_casts instead of C-style casts.
> 
> Sword often uses the statement using namespace std; It is generally
> considered bad style to import an entire namespace, particularly in a
> header file. Safer is to just import the names you actually use as in,
> using std::list; or simply explicitly qualify all uses of names. I
> prefer the latter option personally, although I understand use of the
> former.
> 
> This is getting rather voluminous, so I think I'll stop now. I apologise
> if any of my comments seem like unnecessary criticisms, they are
> intended rather to be helpful comments on how to improve the quality of
> the code base. I can assure you that the code to my program is far worse
> than Sword's code (I really had no idea what I was doing when I first
> started writing it). Also, some of my above comments may be invalid,
> since use of a subset of C++'s features may be intentional to allow
> portability to some platforms with poor C++ support.
> 
> I would be happy to comment further if any of my comments have proved
> helpful or instructive.
> 
> God Bless,
> 
> David.