mvnForum Homepage

Print at Jul 23, 2014 11:31:00 PM View all posts in this thread on one page
Posted by ua4ever at Nov 25, 2008 10:14:55 AM
OSIS questions
Is there a way in OSIS (and Sword implementation) to tag word stress within the canonical text of the Bible?

I cannot stress enough (pan intended) how important the word stress is. This is true at least in Russian and Ukrainian Bibles where translations are dated and word stress can fall almost on any vowel in any position. Sometimes the stress distinguishes word meaning; other times unfamiliar (archaic) words could be butchered and there is nothing to give the clue as to the proper pronunciation.

I know that the Unicode allows word stress combining diacritics; however, using them would break up the search function (diacritics is an extra character for search purposes).

Thanks.

PS

1. Is there a better forum to discuss OSIS issues/questions?

2. I understand that most Crosswire discussions are on the mailing list. However, I have found that I cannot use them (it's a separate topic as to why). What was the reasoning for not having a developer's forum that would accumulate knowledge in a searchable and structured format?

Posted by ua4ever at Dec 1, 2008 12:14:45 PM
Re: OSIS questions
Ok. Apparently, there is no answer to the word stress problem in OSIS. This is unfortunate, putting in a note for each word with word stress is unreasonable.

If OSIS can't help, then I would make a suggestion for the Sword to implement a list of chars to be excluded/ignored during search. In addition to the Unicode word stress, I could add an apostrophe (it can take several forms depending on the language '’), punctuation/quotation marks, etc.

Posted by scribe at Dec 1, 2008 3:47:05 PM
Re: OSIS questions
Dear ua4ever, SWORD supports the concept of 'strip filters' which get executed before a search. We have a few of them for stripping diacritics and other accents. I am not familiar with the general Unicode concept of 'stress' symbols, but if you can point me to the code region for these, I can add a 'Stressmarks' strip filter to the next release of the engine for you.



Ok. Apparently, there is no answer to the word stress problem in OSIS. This is unfortunate, putting in a note for each word with word stress is unreasonable.

If OSIS can't help, then I would make a suggestion for the Sword to implement a list of chars to be excluded/ignored during search. In addition to the Unicode word stress, I could add an apostrophe (it can take several forms depending on the language '’), punctuation/quotation marks, etc.


Posted by ua4ever at Dec 1, 2008 5:01:35 PM
Re: OSIS questions
This is great!

To indicate word stress, I am using a combining acute accent (U+0301), which is indicated for such use by Unicode:

"Diacritic marks are not encoded by function, and are not specific to language or usage. For example, look at the acute accent. In some languages, it is a diacritic to indicate a distinct letter (with a distinct pronunciation); in other languages it marks a stress, or a quantity; in others it marks a tone. The implications for linguistic processing (including sorting) may be different in each case."

Also see:
http://en.wikipedia.org/wiki/Acute_accent#Stress

Currently, acute accent is treated as a separate character by the Sword search function.


As I indicated above, it is possible to implement an OSIS tag for stress that would be used for display purposes but will be ignored for searches. For example, a dictionary program Lingvo has a stress tag in its DSL format precisely for this.