[sword-devel] Insidious mismatched tag errors: recommendations
thulester at gmail.com
Sun Sep 23 08:35:36 MST 2012
This is the clearest description of the milestone vs containers, BCV vs BSP
issue I've seen so far. Thanks for the summary. It's very helpful.
On Friday, September 21, 2012, DM Smith wrote:
> So far the discussion is around whether the xml is well-formed.
> Once you get that working, then you need to make sure it is valid wrt the
> OSIS schema.
> There's an old tool that will convert sgml to well-formed xml. I think it
> was James Clark's "sx". I've used it successfully on initial conversions
> and getting something that will work within xml tools.
> Finally, OSIS has the notion of milestones for start and end elements.
> There are semantic rules regarding this that cannot be checked by standard
> xml tools. Osis2mod tries to handle this. When you get to that point, I can
> help unravel the logging options.
> The purpose of milestoned elements is to allow for two competing document
> models to be in the same xml document: BSP and BCV (names we've given it
> here and in the wiki).
> We recommend using BSP (book, chapter, section, paragraph, poetry, lists
> to all be containers, not milestoned) and verse elements be milestoned.
> Note, the OSIS manual says that if you have one element milestoned, then
> all other elements with the same tag name have to be milestoned.
> Practically speaking, this does not matter. SWORD and JSword don't care.
> Having verses milestoned only if necessary is probably a better way to
> create a good XML document. Start out with all of them as containers and
> each place where that causes a problem, either fix the xml or if otherwise
> correct, convert to milestoned verses.
> Generally speaking these BSP elements should not start just inside or at
> the end of a verse. Rather they should be between verse elements or within
> the text. When they are placed just after the verse start, they often will
> cause the verse number to be orphaned. When they are placed just before the
> verse end, then it is generally not noticeable (just bad form).
> Quotes will create the biggest grief in the above. They often cross
> boundaries. Certainly, the beatitudes does, starting in one chapter and
> ending a couple of chapters later. For this reason, using the milestoned
> version is necessary.
> If you're document follows some simple rules (some required by xml, others
> simplifications), then checking nesting is a simple matter of having a
> push/pop stack of elements. The simple rules:
> 1) All attributes when present have quoted values.
> 2) All entities are properly formed and used when needed. Also, < and >
> are not in attribute values.
> 3) Tags are marked with < ... >, </ ... >, or < ... />. and now new lines
> between < and >.
> If this is true then a simple perl script can be written to find the
> problems in the file:
> Look for < ... /> and skip them. They cause no problems.
> Look for < xxx ... > and push the tag name along with its location in the
> file on to the stack.
> Look for < xxx />, compare xxx to the top element on the stack. If it
> doesn't match, then it causes an error.
> When you get to the end of the document and the stack is not empty, then
> the elements on the stack are not closed properly.
> Printing out the stack (elements and locations) would help find what the
> problem is.
> For example:
> if xxx is deeper in the stack, then there is a problem with nesting. Look
> at all the elements above the xxx on the stack for problems.
> if it is not in the stack, then the element was not started prior to that
> point or it may have been ended twice.
> Here is a simple perl script (that I wrote), which doesn't do that, but
> could be adapted to do it. This creates a histogram/dictionary of tag and
> attribute names.
> use strict;
> my %tags = ();
> my %attrs = ();
> while (<>)
> # While there is a tag on the line
> while (/<[^\/\s>]+[\/\s>]/o)
> # While there is an attribute in the tag
> while (/<[^\/\s>]+\s+[^\=\/\>]+=\"[^\"]+\"/o)
> # remove the attribute
> s/<([^\/\s>]+)\s+([^\=\/\>]+)(\="[^\"]+\")(.*)/<$1 $4/o;
> my ($t, $a, $v, $r) = ($1, $2, $3, $4);
> # remove the tag
> #print("do next tag on line\n");
> #print("do next line\n");
> foreach my $tag (sort keys %tags)
> foreach my $attr (sort keys %attrs)
> Hope this helps,
> Thanks everyone for suggestions. I'll give them all a try.
> That said, the emacs recommendation is nearly a religious conversion
> recommendation. (I'm on the vi side of the vi verses emacs debate. I
> suppose as long as it doesn't kill me I should give it a try, though I'm
> not certain what impact it will have on the health of my soul ... :D )
> On Thursday, September 20, 2012, Daniel Owens wrote:
>> I use jEdit with the XML plugin installed. I find it helps me find
>> problems fairly easily.
>> On 09/20/2012 05:26 PM, Greg Hellings wrote:
>>> There are a number of pieces of software out there that will
>>> pretty-print the XML for you, with indenting and whatnot. Overly
>>> indented for what you would want in production but decent for
>>> debugging mismatching nesting and the like.
>>> For example, 'xmllint --format' will properly indent the file, etc. I
>>> don't know how it will handle poorly formed XML.
>>> GUI editors can do wonders as well. On Windows I use Notepad++ and
>>> manually set it to display XML. gEdit and Geany - I believe - both
>>> support similar display worlds. And there are some plugins for Eclipse
>>> that might handle what you need as well.
>>> On Thu, Sep 20, 2012 at 4:19 PM, Karl Kleinpaste <karl at kleinpaste.org>
>>>> Andrew Thule <thulester at gmail.com> writes:
>>>>> One of my least favour things is finding mismatched tags in OSIS.xml
>>>>> Has anyone successfully climbed this summit?
>>>> XEmacs and xml-mode (and font-lock-mode). M-C-f and M-C-b execute
>>>> sgml-forward-element and -backward-. That is, sitting at the beginning
>>>> of <tag>, M-C-f (meta-control-f) moves forward to the matching </tag>,
>>>> properly handling nested tags.
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> Instructions to unsubscribe/change your settings at above page
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> Instructions to unsubscribe/change your settings at above page
>> sword-devel mailing list: sword-devel at crosswire.org
>> Instructions to unsubscribe/change your settings at above page
> 'cvml', 'sword-devel at crosswire.org');>
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the sword-devel