[osis-core] On using OpenOffice as an OSIS editor

Patrick Durusau osis-core@bibletechnologieswg.org
Fri, 14 Jun 2002 07:23:25 -0400


Harry,

Just a brief and inadequte reply to your post on OpenOffice. ;-)

I have been using it for several weeks and while sometimes slow, seems 
fairly stable.

Not certain that we would have to use macros. The issue has arisen in 
TEI land (again!) of how to get users better tools for entering markup. 
One suggestion has been to use OpenOffice (with styles) and an XSLT 
stylesheet to convert the underlying XML from OpenOffice XML into TEI. 
(This originated in a discussion between Sebastian Rahtz and myself over 
writing an export filter for OpenOffice. Since OpenOffice has a native 
XML format, XSLT would be a simple way to test the difficulty of going 
from OpenOffice XML to TEI, without the overhead of writing the filter.)

If I get some time this weekend, I may try to input a chapter or so, 
probably the Matthew chapter that has been the subject of so much 
discussion, to see what sort of XML we would get from OpenOffice with no 
tweaking. Might be a good measure of how much trouble we would encounter 
with such an approach.

Thanks!

Patrick

Harry Plantinga wrote:

>Preface: I've thought for years how to make ThML easy
>for a non-XML-user to edit and I haven't yet come up 
>with a solution that gets the documents all the way
>to the valid XML stage. I've tried using Word as an
>editor with a custom stylesheet and macros, and that's
>about the best solution I've had, but it leaves quite
>a bit of work for an expert to correct markup, validate
>the document, convert to XML, etc. Often several hours
>per document. 
>
>I'd more or less given up on Word because I want the
>resulting documents to be valid XML, not requiring 
>additional work. (Requiring an XML expert to finish up
>documents is a major bottleneck in the pipeline, to mix
>metaphores slightly.) The obvious approach is an XML 
>editor, and this summer I'm experimenting with XMetaL.  
>
>In theory it is a very nice approach. You can edit in
>a view that looks as wysiwyg as CSS can make it. 
>You can write Word import macros and save in HTML
>or PDF as well as XML. You can preview in a browser
>with XSLT and CSS styling.  You can add macros, buttons,
>and the like to the user interface.
>
>In practice, it's working out reasonably well. The main
>gotchas are that the software is poorly documented in some
>cases, slow, buggy, and possibly in flux (Corel recently
>bought SoftQuad). Oh, and it costs hundreds of dollars
>and runs only on Windows. 
>
>Reading up on the archive for this list, I came across
>teh discussion about using OpenOffice, and I thought I'd
>give it another look. (Last time I checked, it couldn't 
>print, etc.)  I expected to report that it wouldn't be
>appropriate without extensive source code hacking, for
>the same reason that Word isn't great: the content model
>is pretty flat and basic, making it hard to use to validate
>more complex content models. 
>
>============= summary ========================
>
>I came away from my exploration thinking that one could
>do a pretty decent job of an OSIS editor with fairly
>extensive macro programming but no source-code hacking. 
>Maybe a few months' effort. There are sufficient UI 
>interface elements to do a decent to good job, but not 
>great: I doubt it will be possible to prevent illegal 
>structure entry. It'll require a "validate" button and a 
>validation process to correct errors before the document 
>can be saved in OSIS format.
>
>=========== about OpenOffice ==========================
>
>OpenOffice has several modules: word processor, spreadsheet,
>drawing program, presentation program, etc. All use XML
>as their native file format. The suite has recently been
>released in Version 1.0.  It's quite a full featured 
>near-clone of Microsoft Office, and it works quite well.
>There are still lots of little gotchas in reading or saving
>Microsoft Office documents though. OpenOffice is free, open,
>and available for Windows, Mac, Linux, Unix, etc.  Download
>from www.openoffice.org
>
>
>=========== about OpenOffice's text DTD module =========
>
>The openoffice DTD has many modules. One, called text, is
>the primary one for the word processor, though it doesn't
>contain the table elements.  It has 181 elements, 
>including 84 with content model PCDATA and 38 EMPTY. The
>main structure is that sections contain paragraph-level
>elements (p, h, lists, tables, indexes, etc.).  Paragraph-
>level elements contain inline elements (PCDATA, span, tabstop,
>bookmark, drawing, a, set-page-variable, reference-mark-start,
>footnote-ref, etc. etc.).
>
>It doesn't have a nice mapping to OSIS, but it may be possible
>to "fake it" as described below.
>
>========= Proposal for editing OSIS with OpenOffice =========
>
>osisText/header:
>  - store information in predefined openOffice elements or 
>    an openOffice element field element of type user-defined.
>  - make an openOffice form to enter the data in the document.
>
>OSIS front, body, back
>  - use OpenOffice section elements
>
>OSIS divs
>  - use outlining facility of OpenOffice. Specifically, use the 
>    <h> element, which has a numeric level attribute.  
>  - each heading is the start of a new div, with the heading
>    level giving the nesting depth of the div
>  - each div ends at the next heading paragraph
>  - text of the heading could be used as the divTitle
>  - maybe display the heading in reverse video to show that 
>    the text of the heading is not part of the document flow
>  - bonus: OpenOffice "outline view" would show the div structure
>    of the document.
>
>OSIS linegroups
>  - the list facility appears to have sufficient sturcture
>    to handle lines and linegroups
>
>Verses split across paragraph boundaries
>  - select the text of the verse and click a "verse" button. A 
>    macro could prompt for verse identifier and add prev and 
>    next attributes to span any paragraph boundaries.
>
>Loading, saving
>  - a macro or plug-in could read and save OSIS documents. 
>  - bonus: importing Word documents, saving HTML and PDF.
>
>Word-level markup. 
>  - I suppose you could do it wiht a combination of macros, 
>    spans, and user-defined field elements.
>
>-Harry
>

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu