[sword-devel] search idea

Matthias Ansorg sword-devel@crosswire.org
Thu, 6 Jan 2000 11:51:02 +0100

On Tue, 4 Jan 2000, Trevor Jenkins wrote:

>>>(By the way, where is a specification of ThML?)
>> Try http://www.ccel.org/ThML/ThML1.0.htm
>Thanks. At first glance this is a real heavy weight markup scheme in the
>sense that all the element names are very long. I'd like to see some
>minimisation there. But comments upon ThML will have to wait until I've got
>more time.

Right. It is the same problem like with HTML where ThML is based on: unnecessarily long tag
names blow up the file size without addional information. The specifications of ThML were 90K in
version 0.93 (PDF) and 240K in version 1.0 (HTML) - this is not due to the new features ...
Another issue is the ability to memorize the tag names when learning a markup language.

>>>> 5. Meta information:
>> "meta information" is an inapropriate expression I used. I tried to describe
>> that kind of data that does not belong to any specific part of the text
>> itself, such as copyright info or author or title or translator or ISBN of
>> printed edition or ... .
>Ah understand. I took a database-centric view of meta data. Perhaps
>publication data would be a better description of what you describe? Or are
>you thinking of other supplementary information being added.

As you said before, what fields are to be introduced depends on the markup of the text. We will
never have more detailed supplementary information than the markup provides (right, Jerry,
translation independent verse information is of course an exception; anyway, that is not the matter
here, talking about publication data or something).
ThML header information is IMHO the most detailed about-the text-information at the moment, and it
is indeed "publication data".

>... I can write an LL(1) parser to deal with this language; even
>write it standing on my head but writing that grammar down in a concise
>format is taking some time.

What is an LL(1) parser? You know, my little understanding .-)

In Christ,