[sword-devel] HELP! Need your feedback on XML Markup Language

Troy A. Griffitts sword-devel@crosswire.org
Fri, 17 Aug 2001 17:32:40 -0700


You're such a smart alec! :)

Liked the paper, though it stretched my knowledge of XPointer and XSLT.

Might I suggest, rather than force the granularity to the smallest
PCDATA in all associative meta hierarchies, that you designate one
document the 'master'; let it raise it's hierarchy from a flat 'word' to
something in which the other ancillary hierarchies _might_ have in
common (e.g. module/testament/chapter/verse/word, or anything it
wishes). Force key attributed to be unique for all levels (like you have
done for 'word') in the master. This will allow a greatly reduced size
and complexity of additional auxiliary hierarchies, and remove the
redundant CDATA from all documents.


To use your example:

Dub your mostly unchanged Pages document the 'master' (just for example
purposes; any file could be dubbed the 'master', but it looks like we
get the most benefit from this first choice).  I've added unique
attributes-- per our 'master' document requirements, above-- throughout
the document (l1[2]=l3, and l2[2]=l4:

<pages>
     <page id="p1">
           <line id="l1">
              <w id="w1">This</w>
              <w id="w2">is</w>
           </line>
           <line id="l2">
              <w id="w3">text</w>
           </line>
     </page>
     <page id="p2">
           <line id="l3">
              <w id="w4">in</w>
              <w id="w5">a</w>
              <w id="w6">base</w>
           </line>
           <line id="l4">
              <w id="w7">file</w>
           </line>
     </page>
</pages>
            

This allows your Text document to be reduced from:

<text>
     <para id="p1">
              <w id="w1">This</w>
              <w id="w2">is</w>
              <w id="w3">text</w>
              <w id="w4">in</w>
              <w id="w5">a</w>
              <w id="w6">base</w>
              <w id="w7">file</w>
     </para>
</text>

to:

<text>
     <para id="p1">
           <page id="p1" />
           <page id="p2" />
     </para>
</text>
            

Clauses from:

<clauses>
        <clause id="c1">
             <s>
                 <w id="w1">This</w>
             </s>
             <p>
                 <w id="w2">is</w>
             </p>
             <c>
                 <w id="w3">text</w>
             </c>
             <a>
                 <w id="w4">in</w>
                 <w id="w5">a</w>
                 <w id="w6">base</w>
                 <w id="w7">file</w>
             </a>
        </clause>
</clauses>

to:

<clauses>
        <clause id="c1">
             <s>
                 <w id="w1" />
             </s>
             <p>
                 <w id="w2" />
             </p>
             <c>
                 <w id="w3" />
             </c>
             <a>
                 <page id="p2" />
             </a>
        </clause>
</clauses>


The saving in space we see here is minimal, but I believe it reduces
error prone redundancy and provides a mechanism to potentially save
exponentially on space.

Please ignore me if I'm may be way off base.  Just my 1/2 cent worth.

	-Troy.