[sword-devel] NET markup problems

Ben Morgan benpmorgan at gmail.com
Sun Jul 20 23:54:56 MST 2008


I was looking at using the elementtree parser in python to pull out a more
or less plain text version of a module quickly for search indexing.
Incidentally, it is quite a bit faster than calling striptext - on the esv
and kjv, it took about 80% of the time striptext takes

I ran into problems trying it on the NETfree however - there seems to be
trailing osis tags at the end of books:
For example, from Genesis 50:26
'So Joseph died at the age of 110.<note osisRef="Gen.50.26" n="33"></note>
After they embalmed him, his body<note osisRef="Gen.50.26" n="34"></note>
was placed in a coffin in Egypt.<milestone type="line" /><milestone
type="line" /> </div> *<chapter eID="Gen.50"/></div>*'

The last two tags in bold shouldn't be there - they are unmatched anywhere,
and removing them allows parsing to work.

The third last tag, which is a div, matches with a tag in the heading of the
chapter - is the raw entry of a verse meant to be able to be taken as valid
xml by itself? If so, this is also invalid.

God Bless,
The Lord is not slow to fulfill his promise as some count slowness,
but is patient toward you, not wishing that any should perish,
but that all should reach repentance.
2 Peter 3:9 (ESV)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.crosswire.org/pipermail/sword-devel/attachments/20080721/ba104edf/attachment-0001.html 

More information about the sword-devel mailing list