The current KJV module has some errors in the text. By comparing the KJV to other e-texts these differences can be determined. In doing this comparison, it is important to use an e-text that is not related to the current module. Related text will likely have the same mistakes. Unrelated e-text will have their own mistakes.
With the differences in hand, validation will be done against a hard copy to determine correctness.
Note: OSIS has special markup for quotes and added text. These will continue to be used.
The KJV has some ill-formed XML. This prevents the usage of tools that adhere to the XML standard, since XML tools are required to fail on bad input.
The most notable problem is that notes are often rendered <note/>body of the note. The other major problem has been in Strong's markup where attributes are present with no value.
There have been numerous threads on sword-devel regarding well-formed at a verse level. That is, there is an expectation that a single verse in isolation be a well-formed fragment. This will never be the case. Sometimes verses start in the middle of a paragraph
It may be possible that chapters can be well-formed. But this is not a requirement of this effort. If they are that's great, but there should be no reliance on it. On the otherhand, books will be well-formed.
The KJV should be valid against the OSIS schema. While it was not validated against the OSIS schema, it is pretty close. The most recent version of the schema is 2.1.1 and is backwardly compatible for the most part. It appears that
loopholes have been closed.
The most notable problem is the <resp> element. This is present in the KJV module but has never been a part of the OSIS spec. A resp attribute is defined. It is up in the air as to whether the content of this element will be retained via transformation.
OSIS best practices have been enumerated in the sword-devel mailing list and in the OSIS user's manual. These will be followed if at all possible.
For example, the milestone version of the <verse> and <chapter> elements will be used. In the case of verses this is not only a best practice, but required by the OSIS spec, as verses can and do cross document structure element paragraph boundaries in the module.
Currently it is:
<w src="1"></w> <w src="2">the house</w>
It should be: (TR given as source of markup)
TR: <w src="1">hO</w><w src="2">OIKOS</w>
KJV: <w src="1 2">the house</w>