[osis-core] Linguistic Annotation Module Design Document

Kirk Lowery osis-core@bibletechnologieswg.org
Mon, 03 Nov 2003 09:17:55 -0500


This is a multi-part message in MIME format.
--------------090804010502010605060207
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Friends,

For your amusement -- but more especially for your expert comment -- I
attach a first draft of a schema design document for OSIS linguistic
annotation; more precisely, for morphologic annotation. We'll get to
syntactic annotation after this. This is the concrete outcome of the
intensive three days of face to face work Steve and I did last week.

- --
Kirk E. Lowery, Ph.D.
Director, Westminster Hebrew Institute
Adjunct Professor of Old Testament
Westminster Theological Seminary, Philadelphia

Theorie ist, wenn man alles weiss und nichts klappt.
Praxis ist, wenn alles klappt und keiner weiss warum.
Bei uns sind Theorie und Praxis vereint:
nichts klappt und keiner weiss warum!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQE/pmOSfUA6+Yl7duERArbmAKCPWUAGbMLRI8+PmycwjUTwGZHoYwCg0jkc
O8WsRiTQ2MVUbRtuSOeNbkE=
=jKEb
-----END PGP SIGNATURE-----

--------------090804010502010605060207
Content-Type: text/html; charset=WINDOWS-1252;
 name="osisLAdesign.html"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="osisLAdesign.html"

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
	<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=windows-1252">
	<TITLE></TITLE>
	<META NAME="GENERATOR" CONTENT="OpenOffice.org 1.1.0  (Win32)">
	<META NAME="AUTHOR" CONTENT="Kirk Lowery">
	<META NAME="CREATED" CONTENT="20031102;9350813">
	<META NAME="CHANGEDBY" CONTENT="Kirk Lowery">
	<META NAME="CHANGED" CONTENT="20031103;9015336">
	<STYLE>
	<!--
		@page { size: 8.5in 11in }
		TD P.western { font-family: "Verdana", sans-serif; font-size: 10pt }
		H1.western { font-family: "Verdana", sans-serif; font-size: 20pt }
		P.western { font-family: "Verdana", sans-serif; font-size: 10pt }
		H3.western { font-family: "Verdana", sans-serif; font-size: 12pt }
		H2.western { font-family: "Verdana", sans-serif; font-size: 16pt }
		TH P.western { font-family: "Verdana", sans-serif; font-size: 10pt }
		TT.western { font-size: 10pt }
		CODE.western { font-family: "Courier New", monospace; font-size: 10pt; font-weight: bold }
	-->
	</STYLE>
</HEAD>
<BODY LANG="en-US" BGCOLOR="#ffffcc" DIR="LTR">
<H1 CLASS="western" ALIGN=CENTER>Schema Design for OSIS Linguistic
Annotation</H1>
<H3 CLASS="western">by Kirk Lowery and Steve DeRose,<BR>OSIS
Technical Committee</H3>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3 STYLE="page-break-inside: avoid">
	<COL WIDTH=39*>
	<COL WIDTH=46*>
	<COL WIDTH=171*>
	<THEAD>
		<TR VALIGN=TOP>
			<TH WIDTH=15% BGCOLOR="#ffff99">
				<P CLASS="western">Revision</P>
			</TH>
			<TH WIDTH=18% BGCOLOR="#ffff99">
				<P CLASS="western">Date</P>
			</TH>
			<TH WIDTH=67% BGCOLOR="#ffff99">
				<P CLASS="western">Comments</P>
			</TH>
		</TR>
	</THEAD>
	<TBODY>
		<TR>
			<TD WIDTH=15% VALIGN=BOTTOM SDVAL="0.1" SDNUM="1033;">
				<P CLASS="western" ALIGN=CENTER>0.1</P>
			</TD>
			<TD WIDTH=18% VALIGN=TOP>
				<P CLASS="western" ALIGN=CENTER>11/02/2003 10:16:23</P>
			</TD>
			<TD WIDTH=67% VALIGN=TOP>
				<P CLASS="western">Original draft.</P>
			</TD>
		</TR>
	</TBODY>
</TABLE>
<H2 CLASS="western">Introduction</H2>
<P CLASS="western">The OSIS Linguistic Annotation schema
(<TT CLASS="western"><B>osisLA.x.x.xsd</B></TT>) defines the
elements, attributes and their relationships for linguistic
annotation of an OSIS compliant document. The schema is an extension
&ndash; not a replacement &ndash; of the OSIS Core schema. The
instance document should be a valid OSIS document. The present
proposal assumes inline markup, since we do not expect anyone to be
doing stand-off markup anytime in the near future, given the current
state of software. The goal for version 1.0 will be to have a system
adequate for the markup of the Bible in its original languages at the
morphologic level of analysis.</P>
<H2 CLASS="western">Basic Concepts</H2>
<P CLASS="western">Philosophically, we view an arbitrary span or
segment of the text stream (i. e., the biblical text or the text to
be annotated) to be the element, and the annotation (including
parsing) as attributes of that element. The first issue is that of
the granularity of segmentation of the text. What unit do we wish to
annotate? Since this first phase is focused upon morphology, we
choose the label &ldquo;morpheme&rdquo; to be our unit of text that
we wish to annotate. The <CODE CLASS="western"><B>&lt;w&gt;</B></CODE>
element is redefined in <TT CLASS="western"><B>osisLA.x.x.xsd</B></TT>
as containing at least one or more <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE>
elements. <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE> is the
only new element to be added. It will have a very long list of
attributes, which can be modified by to a language declaration.</P>
<P CLASS="western">The schema will attempt to include everything that
annotation of any language will need. Of course, each individual
language will have its own unique characteristics. These
characteristics will be captured by the language declaration
document. In the beginning, the schema will contain all that is
needed for Hebrew, Aramaic and Greek annotation. From there, later
revisions will begin the process of abstraction for language
universals.</P>
<H2 CLASS="western">Global Issues</H2>
<H3 CLASS="western">Namespace</H3>
<P CLASS="western">Should this module have its own namespace: <CODE CLASS="western"><B>osisLA</B></CODE>
or perhaps just <CODE CLASS="western"><B>ola</B></CODE>?</P>
<H3 CLASS="western">Constraints</H3>
<P CLASS="western">Is there a way that some attributes can be made
contingent upon others? For example, nouns do not have <CODE CLASS="western"><B>person</B></CODE>,
but verbs and pronouns do. Nouns have <CODE CLASS="western"><B>cases</B></CODE>,
but verbs have <CODE CLASS="western"><B>tense</B></CODE>.</P>
<H3 CLASS="western">Inheritance</H3>
<P CLASS="western">It seems reasonable that <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE>
should inherit all of the default attributes of an element from the
<CODE CLASS="western"><B>osis</B></CODE> namespace. Is there any
reason why <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE>
should have the <CODE CLASS="western"><B>osisID</B></CODE> attribute
explicitly set?</P>
<H3 CLASS="western">Data Types</H3>
<P CLASS="western">First impressions suggest that no new data types
need to be derived from those already in place. Would there be a
reason to create new derived types just for linguistic annotation?</P>
<H3 CLASS="western">Discontinuous Morphemes</H3>
<P CLASS="western">Many languages have morphemes which leap across
spans of morphemes. For example, in Hebrew, the verbal stems are sets
of vowels that are inserted in between root consonants. How can these
be handled?</P>
<H2 CLASS="western">Element Summary</H2>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3 STYLE="page-break-inside: avoid">
	<COL WIDTH=52*>
	<COL WIDTH=204*>
	<THEAD>
		<TR VALIGN=TOP>
			<TH WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western">Element 
				</P>
			</TH>
			<TH WIDTH=80% BGCOLOR="#ffff99">
				<P CLASS="western">Description</P>
			</TH>
		</TR>
	</THEAD>
	<TBODY>
		<TR VALIGN=TOP>
			<TD WIDTH=20%>
				<P CLASS="western"><CODE CLASS="western"><B>&lt;w&gt;</B></CODE></P>
			</TD>
			<TD WIDTH=80%>
				<P CLASS="western"><CODE CLASS="western"><B>&lt;redefine&gt;</B></CODE>
				the OSIS <I>word</I> element to include <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE></P>
			</TD>
			<TD WIDTH=80% BGCOLOR="#ffff99">
				<P CLASS="western">This is the primary container for
				morphological parsing. It is <I>self-referential</I>, i. e.,
				<CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE> may contain
				a <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE> element,
				so that alternate analysis may be given, e. g., context-bound vs.
				context-free, or according to a &ldquo;level&rdquo; approach:
				&ldquo;formal&rdquo;, &ldquo;phrase-level&rdquo;, &ldquo;clause-level&rdquo;
				or &ldquo;x-foobar&rdquo;. An &ldquo;alternative-formal&rdquo;,
				&ldquo;alternative-phrase-level&rdquo;, etc., would also be
				needed. Allow <CODE CLASS="western"><B>&lt;note&gt;</B></CODE>
				inside <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE>.</P>
			</TD>
		</TR>
	</TBODY>
</TABLE>
<H2 CLASS="western"><CODE CLASS="western"><FONT SIZE=5>&lt;morpheme&gt;</FONT></CODE>
Attribute Summary</H2>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3>
	<COL WIDTH=52*>
	<COL WIDTH=29*>
	<COL WIDTH=175*>
	<THEAD>
		<TR VALIGN=TOP>
			<TH WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western">Attribute</P>
			</TH>
			<TH WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western">Type</P>
			</TH>
			<TH WIDTH=68% BGCOLOR="#ffff99">
				<P CLASS="western">Description</P>
			</TH>
		</TR>
	</THEAD>
	<TBODY>
		<TR VALIGN=TOP>
			<TD WIDTH=20%>
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Times New Roman, serif">lang</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=11%>
				<P CLASS="western" ALIGN=CENTER>language</P>
			</TD>
			<TD WIDTH=68%>
				<P CLASS="western">Defaults to the <CODE CLASS="western"><B>lang</B></CODE>
				of the instance document. Intended for multi-lingual documents,
				such as the Hebrew Bible (Hebrew and Aramaic). Is this a global
				OSIS element attribute? Or from the <CODE CLASS="western"><B>xml</B></CODE>
				namespace?</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><SPAN STYLE="background: transparent"><B>use</B></SPAN></CODE></P>
			</TD>
			<TD WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><SPAN STYLE="background: transparent">string</SPAN></P>
			</TD>
			<TD WIDTH=68% BGCOLOR="#ffff99">
				<P CLASS="western"><SPAN STYLE="background: transparent">When the
				annotator wishes to indicate the type of analysis: <I>alternate</I>,
				<I>context-bound</I>, <I>context-free</I>, <I>phrase-level</I>,
				<I>clause-level</I>. Defaults to <I>formal</I>, i. e., the basic,
				context-free analysis.</SPAN></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20%>
				<P CLASS="western"><CODE CLASS="western"><B>word_part</B></CODE></P>
			</TD>
			<TD WIDTH=11%>
				<P CLASS="western" ALIGN=CENTER>integer</P>
			</TD>
			<TD WIDTH=68%>
				<P CLASS="western">The position of the morpheme within the word.
				If the morpheme and word are co-extensive, then the value is &ldquo;1&rdquo;.
				Unbounded.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B>pos</B></CODE></P>
			</TD>
			<TD WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>enumerated</P>
			</TD>
			<TD WIDTH=68% BGCOLOR="#ffff99">
				<P CLASS="western">Part of speech. See list below for the
				enumerated values.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20%>
				<P CLASS="western"><CODE CLASS="western"><B>lemma</B></CODE></P>
			</TD>
			<TD WIDTH=11%>
				<P CLASS="western" ALIGN=CENTER>string</P>
			</TD>
			<TD WIDTH=68%>
				<P CLASS="western">The &ldquo;dictionary&rdquo; or &ldquo;base&rdquo;
				form of the morpheme. Older philological terminology: &ldquo;root&rdquo;
				or &ldquo;stem&rdquo;.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B>homograph_number</B></CODE></P>
			</TD>
			<TD WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>integer</P>
			</TD>
			<TD WIDTH=68% BGCOLOR="#ffff99">
				<P CLASS="western">When a string represents more than one lemma.
				Sometimes called &ldquo;homonym&rdquo;.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20%>
				<P CLASS="western"><CODE CLASS="western"><B>*stem</B></CODE></P>
			</TD>
			<TD WIDTH=11%>
				<P CLASS="western" ALIGN=CENTER>string</P>
			</TD>
			<TD WIDTH=68%>
				<P CLASS="western">In Hebrew and Aramaic, the verbal pattern that
				modifies the lemma with various modes: passive, causative,
				intensive, reflexive. In Greek, this is handled differently,
				using &ldquo;voice&rdquo; for many modes. We will need either an
				abstract term here, or expect to handle such things in the
				language declaration.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B>*conjugation</B></CODE></P>
			</TD>
			<TD WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>enumerated</P>
			</TD>
			<TD WIDTH=68% BGCOLOR="#ffff99">
				<P CLASS="western">In Hebrew, these are the <I>inflectional</I>
				sets for verbs; each language is going to have its own set of
				values. Conjugations sometimes mark verbal aspect, other times
				tense or a combination of the two.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20%>
				<P CLASS="western"><CODE CLASS="western"><B>tense</B></CODE></P>
			</TD>
			<TD WIDTH=11%>
				<P CLASS="western" ALIGN=CENTER>enumerated</P>
			</TD>
			<TD WIDTH=68%>
				<P CLASS="western">In some languages this category is marked by
				inflection; in other languages by modal or auxiliary verbs or
				words; in still others, time is contextually marked, i. e., is a
				discourse-level phenomenon. This latter is true for Hebrew.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B>person</B></CODE></P>
			</TD>
			<TD WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>enumerated</P>
			</TD>
			<TD WIDTH=68% BGCOLOR="#ffff99">
				<P CLASS="western">Typically <I>1<SUP>s</SUP></I><SUP>t</SUP>,
				<I>2<SUP>nd</SUP></I>, and <I>3<SUP>rd</SUP></I>. Others are
				possible.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20%>
				<P CLASS="western"><CODE CLASS="western"><B>gender</B></CODE></P>
			</TD>
			<TD WIDTH=11%>
				<P CLASS="western" ALIGN=CENTER>enumerated</P>
			</TD>
			<TD WIDTH=68%>
				<P CLASS="western" STYLE="font-weight: medium"><I>Masculine</I>,
				<I>feminine</I>, <I>neuter</I>. Hebrew and Aramaic do not have a
				<I>neuter</I>, but Greek does.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B>number</B></CODE></P>
			</TD>
			<TD WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>enumerated</P>
			</TD>
			<TD WIDTH=68% BGCOLOR="#ffff99">
				<P CLASS="western" STYLE="font-weight: medium"><I>Singular</I>,
				<I>dual</I>, <I>plural</I> are defaults. Of the biblical
				languages, Greek does not have a dual.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20%>
				<P CLASS="western"><CODE CLASS="western"><B>*state</B></CODE></P>
			</TD>
			<TD WIDTH=11%>
				<P CLASS="western" ALIGN=CENTER>enumerated</P>
			</TD>
			<TD WIDTH=68%>
				<P CLASS="western" STYLE="font-weight: medium">Unique to Hebrew
				and Aramaic. The values are <I>absolute</I> and <I>construct</I>.
				This value will eventually belong to the Hebrew and Aramaic
				language declarations.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B>*kqtype</B></CODE></P>
			</TD>
			<TD WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>enumerated</P>
			</TD>
			<TD WIDTH=68% BGCOLOR="#ffff99">
				<P CLASS="western" STYLE="font-weight: medium">The <I>ketiv-qere</I>
				&ldquo;what is written; what is read&rdquo; is a scribal
				&ldquo;marginal&rdquo; note to correct the reading of the text.
				As such, it is unique to Hebrew Bible manuscripts.</P>
			</TD>
		</TR>
	</TBODY>
</TABLE>
<H2 CLASS="western">Attribute Value Summary</H2>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3>
	<COL WIDTH=41*>
	<COL WIDTH=30*>
	<COL WIDTH=25*>
	<COL WIDTH=38*>
	<COL WIDTH=122*>
	<THEAD>
		<TR VALIGN=TOP>
			<TH WIDTH=16% BGCOLOR="#ffff99">
				<P CLASS="western">Attribute</P>
			</TH>
			<TH COLSPAN=3 WIDTH=36% BGCOLOR="#ffff99">
				<P CLASS="western">Values</P>
			</TH>
			<TH WIDTH=48% BGCOLOR="#ffff99">
				<P CLASS="western">Comments</P>
			</TH>
		</TR>
	</THEAD>
	<TBODY>
		<TR VALIGN=TOP>
			<TD WIDTH=16%>
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">lang</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=12%>
				<P CLASS="western"><EM>he</EM></P>
			</TD>
			<TD WIDTH=10%>
				<P CLASS="western"><EM>ar</EM></P>
			</TD>
			<TD WIDTH=15%>
				<P CLASS="western"><EM>gr</EM></P>
			</TD>
			<TD WIDTH=48%>
				<P CLASS="western">The default is the value of the <CODE CLASS="western"><B>xml:lang</B></CODE>
				attribute of the instance document. 
				</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">use</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=12% BGCOLOR="#ffff99">
				<P CLASS="western"><EM><I>formal<BR>base<BR>alternate</I></EM></P>
			</TD>
			<TD WIDTH=10% BGCOLOR="#ffff99">
				<P CLASS="western"><EM><I>word-level<BR>phrase-level<BR>clause-level</I></EM></P>
			</TD>
			<TD WIDTH=15% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>context-free<BR>context-bound</EM></P>
			</TD>
			<TD WIDTH=48% BGCOLOR="#ffff99">
				<P CLASS="western">One may take many different perspectives in
				analyzing a morpheme. One can take a purely formalist approach;
				one can view how the morpheme is used relative to another
				morpheme or set of morphemes; how the morpheme relates to the
				verb, or across clause boundaries (e. g., pronoun antecedents).
				This is not always the choice of the analyst: languages often
				require a particular perspective by the very inflectional
				category distribution itself. The default value is <I>formal</I>.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16%>
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">word_part</FONT></B></CODE></P>
			</TD>
			<TD COLSPAN=3 WIDTH=36%>
				<P CLASS="western"><EM>1...&infin;</EM></P>
			</TD>
			<TD WIDTH=48%>
				<P CLASS="western">The position inside a word. The order begins
				with the integer &ldquo;1&rdquo;, which is also the default
				value. When a word has only one morpheme, the value of this
				attribute is the default &ldquo;1&rdquo;.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">pos</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=12% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>noun<BR><BR>common_noun<BR>proper_noun<BR>adjective<BR>pronoun</EM></P>
			</TD>
			<TD WIDTH=10% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>verb<BR><BR>finite_verb<BR>participle<BR>infinitive</EM></P>
			</TD>
			<TD WIDTH=15% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>particle<BR><BR>adverb<BR>preposition<BR>definite_article<BR>interrogative<BR>negative</EM></P>
			</TD>
			<TD WIDTH=48% BGCOLOR="#ffff99">
				<P CLASS="western">&ldquo;Part of speech&rdquo; is a slippery
				concept, apt to change substantially in meaning from language to
				language, and from within various linguistic theoretical camps.
				For example, there is no inflectional category for adverbs in
				biblical Hebrew, but there are lexical adverbs.</P>
				<P CLASS="western">Terms can also be more general and abstract,
				or more specific. One could understand &ldquo;noun&rdquo; as a
				part of speech, with attributes such as &ldquo;common&rdquo;,
				&ldquo;personal&rdquo;, &ldquo;gentilic&rdquo;, &ldquo;animate&rdquo;,
				&ldquo;geographic&rdquo;, &ldquo;political&rdquo; starting to
				blur the distinction between grammar and lexicon.</P>
				<P CLASS="western">These decisions must be left up to the user,
				but the user <B>must</B> address them in the language
				declaration.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16%>
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">lemma</FONT></B></CODE></P>
			</TD>
			<TD COLSPAN=3 WIDTH=36%>
				<P CLASS="western"><BR>
				</P>
			</TD>
			<TD WIDTH=48%>
				<P CLASS="western">This is the base or dictionary form (cf.
				German <I>Stichwort</I>). There is no default, and the value can
				be <I>empty.</I></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">homograph_number</FONT></B></CODE></P>
			</TD>
			<TD COLSPAN=3 WIDTH=36% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>0...&infin;</EM></P>
			</TD>
			<TD WIDTH=48% BGCOLOR="#ffff99">
				<P CLASS="western">Morphemes that are spelled exactly the same,
				but have more than one (unrelated) meaning, or have differing
				etymology. The default is &ldquo;0&rdquo;, i. e., no homograph,
				the form is unique.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16%>
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">*stem</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=12%>
				<P CLASS="western" ALIGN=CENTER><EM>active</EM></P>
				<P CLASS="western" ALIGN=LEFT><EM>qal<BR>qal passive<BR>piel<BR>pual</EM></P>
			</TD>
			<TD WIDTH=10%>
				<P CLASS="western" ALIGN=CENTER><EM>causative</EM></P>
				<P CLASS="western" ALIGN=LEFT><EM>hifil<BR>hofal</EM></P>
			</TD>
			<TD WIDTH=15%>
				<P CLASS="western" ALIGN=CENTER><EM><I>reflexive</I></EM></P>
				<P CLASS="western"><EM><I>nifal<BR>hitpael</I></EM></P>
			</TD>
			<TD WIDTH=48%>
				<P CLASS="western">More precisely, these are verbal patterns:
				vocalic insertions into the tri-radical verbal root consonants,
				modifying the basic lexical meaning in some consistent way.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">*conjugation</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=12% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>perfect<BR>imperfect</EM></P>
			</TD>
			<TD WIDTH=10% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>imperative<BR>jussive</EM></P>
			</TD>
			<TD WIDTH=15% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>participle<BR>infinitive_absolute<BR>infinitive_construct</EM></P>
			</TD>
			<TD WIDTH=48% BGCOLOR="#ffff99">
				<P CLASS="western">For Hebrew and Aramaic, the verbal
				inflectional sets mark different verbal aspects. For Greek,
				tenses and aspects are combined for the various paradigms; so
				this list would not be adequate for Greek NT markup.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16%>
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">tense</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=12%>
				<P CLASS="western"><EM>past</EM></P>
			</TD>
			<TD WIDTH=10%>
				<P CLASS="western"><EM>present</EM></P>
			</TD>
			<TD WIDTH=15%>
				<P CLASS="western"><EM>future</EM></P>
			</TD>
			<TD WIDTH=48%>
				<P CLASS="western">Time is often combined with kind of action in
				verbs. What is listed here is &ldquo;pure&rdquo; time, and
				nothing else. This simple list is hardly exhaustive: one can
				enumerate many different kinds of time, depending upon where one
				stands on the timeline.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">person</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=12% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>first</EM></P>
			</TD>
			<TD WIDTH=10% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>second</EM></P>
			</TD>
			<TD WIDTH=15% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>third</EM></P>
			</TD>
			<TD WIDTH=48% BGCOLOR="#ffff99">
				<P CLASS="western"><BR>
				</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16%>
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">gender</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=12%>
				<P CLASS="western"><EM>masculine<BR>none</EM></P>
			</TD>
			<TD WIDTH=10%>
				<P CLASS="western"><EM>feminine<BR>any</EM></P>
			</TD>
			<TD WIDTH=15%>
				<P CLASS="western"><EM>neuter<BR>both (m &amp; f)</EM></P>
			</TD>
			<TD WIDTH=48%>
				<P CLASS="western">Gender is very language-specific. In Hebrew,
				there is no neuter, and many nouns are treated ambiguously. Some
				languages, such as Hungarian, do not inflect for gender at all.
				Do we distinguish between <I>lexical</I><SPAN STYLE="font-style: normal">
				and <I>formal</I> (inflected) gender?</SPAN></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">number</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=12% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>singular</EM></P>
			</TD>
			<TD WIDTH=10% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>dual</EM></P>
			</TD>
			<TD WIDTH=15% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>plural</EM></P>
			</TD>
			<TD WIDTH=48% BGCOLOR="#ffff99">
				<P CLASS="western">This covers most language use. &ldquo;One&rdquo;
				and &ldquo;many&rdquo; seems to be the primary distinction, but
				some cultures will have special forms to meet special needs. One
				example here: the Semitic languages have a special <I>dual</I>
				form for objects which are natural pairs &ndash; hands, eyes,
				etc.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16%>
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">*state</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=12%>
				<P CLASS="western"><EM>absolute</EM></P>
			</TD>
			<TD WIDTH=10%>
				<P CLASS="western"><EM>construct</EM></P>
			</TD>
			<TD WIDTH=15%>
				<P CLASS="western"><BR>
				</P>
			</TD>
			<TD WIDTH=48%>
				<P CLASS="western">State has to do with the intonation of the
				noun. In the <I>absolute</I> state, the accent usually occurs on
				the last syllable. In the <I>construct</I> state, the accent
				shifts forward, and long vowels usually shorten as much as
				possible. Semantically, the <I>construct</I> form marks the
				&ldquo;genitive&rdquo; or &ldquo;possessive&rdquo;, and can also
				have an adjectival function, e. g., &ldquo;king of righteousness&rdquo;
				== &ldquo;righteous king&rdquo;.</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=16% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">*kqtype</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=12% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=LEFT><EM>neither</EM></P>
			</TD>
			<TD WIDTH=10% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>ketiv</EM></P>
			</TD>
			<TD WIDTH=15% BGCOLOR="#ffff99">
				<P CLASS="western"><EM>qere</EM></P>
			</TD>
			<TD WIDTH=48% BGCOLOR="#ffff99">
				<P CLASS="western">When Jewish medieval scribes recognized what
				was to them an obvious &ldquo;error&rdquo; in the main biblical
				text, they had a problem: the text is sacred and may not be
				changed. So they made the correction in the consonants in the
				margin, and the vowels in the main line of the text are those
				that match the consonants in the margin. The consonants in the
				main column of the text is called the &ldquo;<I>ketiv</I>&rdquo;
				or &ldquo;what is written&rdquo;; the consonants in the margin
				combined with the vowels written with the <I>ketiv</I> is called
				the <I>qere </I>or &ldquo;what is read&rdquo;.</P>
			</TD>
		</TR>
	</TBODY>
</TABLE>
<H2 CLASS="western">To Do</H2>
<UL>
	<LI><P CLASS="western">Add the grammatical categories for Aramaic
	and Greek</P>
	<LI><P CLASS="western">Abstract a &ldquo;universal&rdquo; language
	declaration: those declarations that all languages will need.</P>
	<LI><P CLASS="western">Create language declarations for Hebrew,
	Greek, Aramaic, English and the other major European languages.</P>
	<LI><P CLASS="western">Resolve issues of how to modularize and
	invoke the OSIS Linguistic Annotation module along with the
	concomitant language declarations.</P>
	<LI><P CLASS="western">Create simple mark up examples, but using
	real-world text.</P>
</UL>
</BODY>
</HTML>
--------------090804010502010605060207--