[sword-devel] search idea

Paul Gear sword-devel@crosswire.org
Sat, 15 Jan 2000 10:31:08 +0000

darwin@ichristian.com wrote:

> Paul Gear wrote:
> > darwin@ichristian.com wrote:
> > > Paul Gear wrote:
> <SNIPPPED Content for which I will bow to greater knowledge>

Hey, i'm no expert.  I've just written a couple of parsers at uni, and i know that the difference between
    if (token == "bt") {
    if (token == "book title") {
is trivial, and probably insignificant in the scheme of a program like a web browser or digital library.

> ...
> > > It is illogical to design a language where the process which is done once
> > > is made easy at the expense of the process that is performed millions of
> > > times.
> >
> > In principle i agree with you, but this case is not an example of that.  The overhead is minimal, and can
> > be worked around completely if necessary.  Here's how: If you are worried about "<book title>" being longer
> > than "<bt>", why aren't you worried about "<bt>" being longer than, say, 0x12?  If you're that worried
> > about it, you can write a binary representation of the markup (i.e. "compile" the document to binary form),
> > compress it, and write it to disk.  And if we were writing for embedded systems, we might worry about that,
> > but it's not really that important in the scheme of things.
> I was actually considering suggesting binary tags, but dismissed it since
> it would require a special editor for even making minor changes.

No it wouldn't.  It would just need a 'document compiler' to be written, such that once you're finished editing
a file (say, 'foo.thml'), you run it through the compiler to produce the 'document binary' (say, 'foo.bhtml).
This is how both Logos and STEP work.  They have a source format (SGML and RTF, respectively), and binary format
(Logos' is an 'undefined' proprietary format, while STEP's is well documented - except when their web site is
down ;-).

(Incidentally, Craig Rairdin warned me that bsisg.com might not last very long, so i took a copy of the site
with GNU wget.  If anyone wants a look at it, i can provide it.  It's a 700 Kb tarball.)

> I come from the mindset of a programmer who started on a Commodore Vic-20
> where every bit is important, but would be too small for even the simplest
> Bible software.
> However the mindset has served me well.  I have developed programs and file
> structures that have amazed others due to my background.
> I don't spend full price on something if I can get it on sale, nor will I
> give up bits and byres without a serious fight.

That's quite a popular philosophy.  I must admit i don't like it myself, but it's certainly a valid one.  I
prefer the 'suck it and see' approach - only optimize it if you find that it is necessary to do so.  That is not
to say i think that we shouldn't consider performance - by all means we should make a design that is capable of
being optimized, but when writing code (or text markup), it is much more important to build something that is
maintainable by others.  (I've heard that Donald Knuth talks about the "error of optimizing too soon" - he
believes that a lot of time and effort is wasted on optimizing things that really don't need it.)

> Another issue that just came to mind is that assuming that <book title> is
> better to read/write than <bt> assumes knowledge of English.  I have startd
> to become sensitive to the network comments that an "English only"
> philosophy is arrogant.  Perhaps we would be better served using short
> acronyms where some language neutrality is acheived.

That's a nice thought, but it doesn't scale.  What if the word for book in another language doesn't start with
'b'?  What if there is no equivalent of 'b'?  What if it doesn't use a Latin character set?  We should be
language neutral when we can, but it's become a fact of life that programming and text markup are done in
English.  The texts themselves obviously don't have to be, but i think it would actually be detrimental to those
languages to try to make the markup in native language, because we would waste time on defining the markup that
could be better spent on providing content in those languages.

> I love discussing ideas with others, thanks for sharing your thoughts, and
> putting up with mine.

We all put up with each other.  That's what the body of Christ is about!  :-)  (BTW, i'm re-reading a great book
on that subject at the moment: "The Body", by Charles Colson and Ellen Santilli Vaughan - life-changing stuff!)

> Creation is more scientifically valid than evolution!

And eminently more believable...  ;-)
"He must become greater; i must become less." - John 3:30