[sword-devel] MdF (Monads dot Features) database model

Tue, 4 Jan 2000 13:04:40 -0600

What database model(s) is (are) being employed in SWORD?

One database model which we might want to look into is the MdF database
model.  It was developed in Crist-Jan Doedens' 1994 PhD thesis.  It is a
database model catering to "Expounded text", or "Text and information about
that text".  This seems to me to be what we are trying to store and retrieve
in SWORD, and so I could not keep silent about this Great and Superior
Database Model (GSDM (TM)) :-).

The MdF model is simple, elegant, intuitive, and mathematically clean.  Just
take a look below...

Concepts
-------------
Basically, there are four concepts necessary to understand for understanding
the MdF model:

1) Monads
2) Objects
3) Object types
4) Features

1) Monads

A Monad is simply an integer, and is meant to represent a single,
indivisible unit of text which we wish to store information about.  Thus a
monad could correspond to a single word, lemma (lexeme), morpheme, or
anything else that we wish to define to be the smallest unit that we wish to
store information about.

At the backbone of the database, there is a /string/ of monads, starting at
1 (or 0), and going 2,3,4,5,...,n where n is the largest monad in the
database.  The sequence of the integers dictates the sequence of the text.

2) Objects

An object is a set of monads.  This can be a singleton set (a set with only
one element).

3) Object types

Objects are grouped into object types.  An object can only belong to one
object type.  What object type and object has determines which /features/ an
object has.

4) Features

A feature is a /function/ which takes one argument -- an object of a
specific object type -- and which produces a value of some domain, like
"string" (in case the object type is "word" and the feature is "surface"),
or like "integer" (in case the object type is "verse" and the feature is
"verse number").  The value can belong to any domain, however.

A feature can be a partial function, i.e., it need not be defined for all
objects of a given object type.

Application
----------------
Say we have defined the granularity of the monads to be "word".  And say we
want to store words, Strong's numbers, and book-chapter-verse information.
Then we could have the object types:

- Word
- Verse
- Chapter
- Book

The object type Word would have these features:

- surface   (yielding a string which is the text of the word plus any
punctuation)
- Strong's number (yielding an integer which is the Strong's number)

The object type Verse would have this feature:

- verse (yielding an integer which is the verse)

Similarly for Chapter and Book.

Each Word object would consist of one monad (an object is a set of monads,
remember?).  The sequence of the monads would determine the sequence of the
text.

Each Verse object would consist of all the monads making up the words in
that verse.

Each Chapter would consist of all the monads making up the words in that
chapter.

Similarly for Book.

QL and MQL query languages
-----------------------------------------
In his PhD thesis, Doedens also gave a general-purpose, very powerful query
language to query MdF databases.  The advantage of this language was that it
dealt with absolutely anything you could ever want to ask of an MdF
database.  The disadvantage of the language was that its semantics was given
in a /denotational/ manner, indicating /what/ was to be retrieved, but not
/how/.  It would take several PhD-type persons with lots and lots and lots
of grey matter to figure out how to actually implement QL.

Since we in SIL wanted to use MdF for our Greek syntactic database, I
decided to do something about it.  I took QL and stripped it down to a core
language, retaining all the most powerful, most necessary elements of QL but
leaving out the rest.  Then I gave it an /operational/ semantics, meaning
that the semantics was specified in terms of /how/ to compute the results.
Thus my language, MQL (or Mini QL), is very easily implementable.

MdF, how to implement an MdF database in storage, and MQL are described in
my 100+ page Bachelor thesis which is available in PostScript form upon
request.

Blessings,

Ulrik Petersen

__________________________________________
NetZero - Defenders of the Free World
Get your FREE Internet Access and Email at
http://www.netzero.net/download/index.html