[sword-devel] search idea

Uwe Koloska sword-devel@crosswire.org
Sat, 15 Jan 2000 19:21:49 +0100


On Sam, 15 Jan 2000 wrote the famous Paul Gear:
>darwin@ichristian.com wrote:
>
>> I was actually considering suggesting binary tags, but dismissed it since
>> it would require a special editor for even making minor changes.
>
>No it wouldn't.  It would just need a 'document compiler' to be written, such
>that once you're finished editing  a file (say, 'foo.thml'), you run it
>through the compiler to produce the 'document binary' (say, 'foo.bhtml). This
>is how both Logos and STEP work.  They have a source format (SGML and RTF,
>respectively), and binary format (Logos' is an 'undefined' proprietary format,
>while STEP's is well documented - except when their web site is down ;-). 
>(Incidentally, Craig Rairdin warned me that bsisg.com might not last very
>long, so i took a copy of the site  with GNU wget.  If anyone wants a look at
>it, i can provide it.  It's a 700 Kb tarball.)

I'm interested in this tarball, if it contains a description of their format.

>> Another issue that just came to mind is that assuming that <book title> is
>> better to read/write than <bt> assumes knowledge of English.  I have startd
>> to become sensitive to the network comments that an "English only"
>> philosophy is arrogant.  Perhaps we would be better served using short
>> acronyms where some language neutrality is acheived.
>
>That's a nice thought, but it doesn't scale.  What if the word for book in
>another language doesn't start with 'b'?  What if there is no equivalent of
>'b'?  What if it doesn't use a Latin character set?  We should be language
>neutral when we can, but it's become a fact of life that programming and text
>markup are done in >English.  The texts themselves obviously don't have to be,
>but i think it would actually be detrimental to those languages to try to make
>the markup in native language, because we would waste time on defining the
>markup that >could be better spent on providing content in those languages.

What about changing the naming of the keyowrds (tags) but not their meaning? 
There is a macro-package for TeX called ConTeXt (very similar to LaTeX) that
has minimum three languages the commands are expressed. So you can start a new
chapter with "\chapter" or (for germans) "\kapitel".  And the most beautiful
thing about that is:  there are three reference manuals (english, german and
dutch) that are alphabetically sorted _and_ hyperlinked.  So that you can look
for the word you know and switch to the language that you want to learn about.

For an SGML document you have to give a DTD (as is with XML and ThML as it is
an SGML-DTD) so you can trivially provide information about the tag-language
that you used for this document. -- And since SGML uses Unicode (UTF-8 AFAIK)
you can use every language that you want to (as far as there is a DTD:
SwordML-cs.dtd for a czech language binding for the Sword Markup Language).

And since there is a compiler to translate the human-readable source into a
machine-(only-)readable binary form you only have to provide a DTD and a DSSSL
(or the like -- for translating the tags into binary markers) for every tag
language that you want to support.

Just my 0.02 euro ;-)

Uwe Koloska

-- 
mailto:koloska@rcs.urz.tu-dresden.de
http://rcswww.urz.tu-dresden.de/~koloska/
--                                    --
right now the web page is in german only
but this will change as time goes by ;-)