[sword-devel] Character Frequency

David Haslam dfhmch at googlemail.com
Fri Jul 8 01:17:54 MST 2011


Good stuff Peter,

I guess for some projects that we've worked on, doing the character
frequency analysis on the OSIS files is doing it at the last stage in the
process before module build.

For projects that begin at USFM (or earlier), it would be great to develop a
tool that analyses character frequency of the text (for the whole Bible)
apart from all the USFM tags, etc.

One simple way to do this would be to have a script that does the following:

(a) merges all the USFM files into a single text file
(b) removes all the USFM tags (& the English stuff such as IDs & text in
remarks, etc)
(c) does the character frequency counting

For my part, (a) & (b) could easily be done by means of a TextPipe filter.

David

--
View this message in context: http://sword-dev.350566.n4.nabble.com/Character-Frequency-tp3642222p3653469.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list