[osis-core] On using OpenOffice as an OSIS editor

Harry Plantinga osis-core@bibletechnologieswg.org
Thu, 13 Jun 2002 11:08:54 -0400


Preface: I've thought for years how to make ThML easy
for a non-XML-user to edit and I haven't yet come up 
with a solution that gets the documents all the way
to the valid XML stage. I've tried using Word as an
editor with a custom stylesheet and macros, and that's
about the best solution I've had, but it leaves quite
a bit of work for an expert to correct markup, validate
the document, convert to XML, etc. Often several hours
per document. 

I'd more or less given up on Word because I want the
resulting documents to be valid XML, not requiring 
additional work. (Requiring an XML expert to finish up
documents is a major bottleneck in the pipeline, to mix
metaphores slightly.) The obvious approach is an XML 
editor, and this summer I'm experimenting with XMetaL.  

In theory it is a very nice approach. You can edit in
a view that looks as wysiwyg as CSS can make it. 
You can write Word import macros and save in HTML
or PDF as well as XML. You can preview in a browser
with XSLT and CSS styling.  You can add macros, buttons,
and the like to the user interface.

In practice, it's working out reasonably well. The main
gotchas are that the software is poorly documented in some
cases, slow, buggy, and possibly in flux (Corel recently
bought SoftQuad). Oh, and it costs hundreds of dollars
and runs only on Windows. 

Reading up on the archive for this list, I came across
teh discussion about using OpenOffice, and I thought I'd
give it another look. (Last time I checked, it couldn't 
print, etc.)  I expected to report that it wouldn't be
appropriate without extensive source code hacking, for
the same reason that Word isn't great: the content model
is pretty flat and basic, making it hard to use to validate
more complex content models. 

============= summary ========================

I came away from my exploration thinking that one could
do a pretty decent job of an OSIS editor with fairly
extensive macro programming but no source-code hacking. 
Maybe a few months' effort. There are sufficient UI 
interface elements to do a decent to good job, but not 
great: I doubt it will be possible to prevent illegal 
structure entry. It'll require a "validate" button and a 
validation process to correct errors before the document 
can be saved in OSIS format.

=========== about OpenOffice ==========================

OpenOffice has several modules: word processor, spreadsheet,
drawing program, presentation program, etc. All use XML
as their native file format. The suite has recently been
released in Version 1.0.  It's quite a full featured 
near-clone of Microsoft Office, and it works quite well.
There are still lots of little gotchas in reading or saving
Microsoft Office documents though. OpenOffice is free, open,
and available for Windows, Mac, Linux, Unix, etc.  Download
from www.openoffice.org


=========== about OpenOffice's text DTD module =========

The openoffice DTD has many modules. One, called text, is
the primary one for the word processor, though it doesn't
contain the table elements.  It has 181 elements, 
including 84 with content model PCDATA and 38 EMPTY. The
main structure is that sections contain paragraph-level
elements (p, h, lists, tables, indexes, etc.).  Paragraph-
level elements contain inline elements (PCDATA, span, tabstop,
bookmark, drawing, a, set-page-variable, reference-mark-start,
footnote-ref, etc. etc.).

It doesn't have a nice mapping to OSIS, but it may be possible
to "fake it" as described below.

========= Proposal for editing OSIS with OpenOffice =========

osisText/header:
  - store information in predefined openOffice elements or 
    an openOffice element field element of type user-defined.
  - make an openOffice form to enter the data in the document.

OSIS front, body, back
  - use OpenOffice section elements

OSIS divs
  - use outlining facility of OpenOffice. Specifically, use the 
    <h> element, which has a numeric level attribute.  
  - each heading is the start of a new div, with the heading
    level giving the nesting depth of the div
  - each div ends at the next heading paragraph
  - text of the heading could be used as the divTitle
  - maybe display the heading in reverse video to show that 
    the text of the heading is not part of the document flow
  - bonus: OpenOffice "outline view" would show the div structure
    of the document.

OSIS linegroups
  - the list facility appears to have sufficient sturcture
    to handle lines and linegroups

Verses split across paragraph boundaries
  - select the text of the verse and click a "verse" button. A 
    macro could prompt for verse identifier and add prev and 
    next attributes to span any paragraph boundaries.

Loading, saving
  - a macro or plug-in could read and save OSIS documents. 
  - bonus: importing Word documents, saving HTML and PDF.

Word-level markup. 
  - I suppose you could do it wiht a combination of macros, 
    spans, and user-defined field elements.

-Harry