|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 4 |
|
| Author |
|
|
Regular Joined: Nov 25, 2008 Post Count: 11 Status: Offline |
Is there a way in OSIS (and Sword implementation) to tag word stress within the canonical text of the Bible? I cannot stress enough (pan intended) how important the word stress is. This is true at least in Russian and Ukrainian Bibles where translations are dated and word stress can fall almost on any vowel in any position. Sometimes the stress distinguishes word meaning; other times unfamiliar (archaic) words could be butchered and there is nothing to give the clue as to the proper pronunciation. I know that the Unicode allows word stress combining diacritics; however, using them would break up the search function (diacritics is an extra character for search purposes). Thanks. PS 1. Is there a better forum to discuss OSIS issues/questions? 2. I understand that most Crosswire discussions are on the mailing list. However, I have found that I cannot use them (it's a separate topic as to why). What was the reasoning for not having a developer's forum that would accumulate knowledge in a searchable and structured format? |
||
|
|
Regular Joined: Nov 25, 2008 Post Count: 11 Status: Offline |
Ok. Apparently, there is no answer to the word stress problem in OSIS. This is unfortunate, putting in a note for each word with word stress is unreasonable. If OSIS can't help, then I would make a suggestion for the Sword to implement a list of chars to be excluded/ignored during search. In addition to the Unicode word stress, I could add an apostrophe (it can take several forms depending on the language '’), punctuation/quotation marks, etc. |
||
|
|
Strange Joined: Sep 17, 2003 Post Count: 9 Status: Offline |
Dear ua4ever, SWORD supports the concept of 'strip filters' which get executed before a search. We have a few of them for stripping diacritics and other accents. I am not familiar with the general Unicode concept of 'stress' symbols, but if you can point me to the code region for these, I can add a 'Stressmarks' strip filter to the next release of the engine for you. Ok. Apparently, there is no answer to the word stress problem in OSIS. This is unfortunate, putting in a note for each word with word stress is unreasonable. If OSIS can't help, then I would make a suggestion for the Sword to implement a list of chars to be excluded/ignored during search. In addition to the Unicode word stress, I could add an apostrophe (it can take several forms depending on the language '’), punctuation/quotation marks, etc. |
||
|
|
Regular Joined: Nov 25, 2008 Post Count: 11 Status: Offline |
This is great! To indicate word stress, I am using a combining acute accent (U+0301), which is indicated for such use by Unicode: "Diacritic marks are not encoded by function, and are not specific to language or usage. For example, look at the acute accent. In some languages, it is a diacritic to indicate a distinct letter (with a distinct pronunciation); in other languages it marks a stress, or a quantity; in others it marks a tone. The implications for linguistic processing (including sorting) may be different in each case." Also see: http://en.wikipedia.org/wiki/Acute_accent#Stress Currently, acute accent is treated as a separate character by the Sword search function. As I indicated above, it is possible to implement an OSIS tag for stress that would be used for display purposes but will be ignored for searches. For example, a dictionary program Lingvo has a stress tag in its DSL format precisely for this. ---------------------------------------- [Edit 1 times, last edit by ua4ever at Dec 1, 2008 5:06:06 PM] |
||
|
|
|
|
Current timezone is GMT Jun 19, 2013 7:02:52 PM |