[sword-devel] Detecting and correcting poor hyphenation in source texts?

David Haslam d.haslam at ukonline.co.uk
Fri Jan 2 09:24:38 MST 2009


While reading through Hebreux 6 in the French BBB (Go Bible on my K750i),
today I found some inappropriate spaces occurring immediately after a
hyphen. A search for "- " found 45 of these bad hyphenations, but three of
these were valid. I have done a manual search and replace on the
source-text, and then rebuilt the FrenchBBB Go Bible.

I am reporting this because CrossWire also has a SWORD beta-module for the
FrenchBBB.

Generalising from this, detecting bad hyphenation requires a knowledge of
the language, else how can one distinguish it from valid hyphenation. The
instance that caught my eye was "pe- tit", which should be "petit".

The usual rejoinder one gets from CrossWire when even minor source text
issues are observed is, "Wait until we get a better source!"  From a
practical viewpoint, we should admit that this rarely happens, especially
for such minor blemishes that can easily occur because of word-wrapping or
during OCR.

I don't have a generic solution, but I do wish to start a discussion.  Any
ideas?  What can we do to help our "suppliers" when such "proof-reading
errors" are found?

-- David Haslam


-- 
View this message in context: http://www.nabble.com/Detecting-and-correcting-poor-hyphenation-in-source-texts--tp21253460p21253460.html
Sent from the SWORD Dev mailing list archive at Nabble.com.




More information about the sword-devel mailing list