mvnForum Homepage Welcome Guest  | Guest Setting  |  Register  |   Login
  Search  
Index  | Recent Threads  | Unanswered Threads  | Who's Online  | User List  | Help


»

No member browsing this thread
Thread Status: Active
Total posts in this thread: 4
Post new Thread
Author
Previous Thread This topic has been viewed 2845 times and has 3 replies Next Thread
Male ua4ever
Regular




Joined: Nov 25, 2008
Post Count: 11
Status: Offline
Reply to this Post  Reply with Quote 
OSIS questions

Is there a way in OSIS (and Sword implementation) to tag word stress within the canonical text of the Bible?

I cannot stress enough (pan intended) how important the word stress is. This is true at least in Russian and Ukrainian Bibles where translations are dated and word stress can fall almost on any vowel in any position. Sometimes the stress distinguishes word meaning; other times unfamiliar (archaic) words could be butchered and there is nothing to give the clue as to the proper pronunciation.

I know that the Unicode allows word stress combining diacritics; however, using them would break up the search function (diacritics is an extra character for search purposes).

Thanks.

PS

1. Is there a better forum to discuss OSIS issues/questions?

2. I understand that most Crosswire discussions are on the mailing list. However, I have found that I cannot use them (it's a separate topic as to why). What was the reasoning for not having a developer's forum that would accumulate knowledge in a searchable and structured format?
[Nov 25, 2008 10:14:55 AM] Show Printable Version of Post    View Member Profile    Send Private Message [Link] Report threatening or abusive post: please login first  Go to top 
Male ua4ever
Regular




Joined: Nov 25, 2008
Post Count: 11
Status: Offline
Reply to this Post  Reply with Quote 
Re: OSIS questions

Ok. Apparently, there is no answer to the word stress problem in OSIS. This is unfortunate, putting in a note for each word with word stress is unreasonable.

If OSIS can't help, then I would make a suggestion for the Sword to implement a list of chars to be excluded/ignored during search. In addition to the Unicode word stress, I could add an apostrophe (it can take several forms depending on the language '’), punctuation/quotation marks, etc.
[Dec 1, 2008 12:14:45 PM] Show Printable Version of Post    View Member Profile    Send Private Message [Link] Report threatening or abusive post: please login first  Go to top 
Male scribe
Strange
Member's Avatar


Joined: Sep 17, 2003
Post Count: 9
Status: Offline
Reply to this Post  Reply with Quote 
Re: OSIS questions

Dear ua4ever, SWORD supports the concept of 'strip filters' which get executed before a search. We have a few of them for stripping diacritics and other accents. I am not familiar with the general Unicode concept of 'stress' symbols, but if you can point me to the code region for these, I can add a 'Stressmarks' strip filter to the next release of the engine for you.



Ok. Apparently, there is no answer to the word stress problem in OSIS. This is unfortunate, putting in a note for each word with word stress is unreasonable.

If OSIS can't help, then I would make a suggestion for the Sword to implement a list of chars to be excluded/ignored during search. In addition to the Unicode word stress, I could add an apostrophe (it can take several forms depending on the language '’), punctuation/quotation marks, etc.

[Dec 1, 2008 3:47:05 PM] Show Printable Version of Post    View Member Profile    Send Private Message    Hidden to Guest    scribe777_2000    scribe777    119107 [Link] Report threatening or abusive post: please login first  Go to top 
Male ua4ever
Regular




Joined: Nov 25, 2008
Post Count: 11
Status: Offline
Reply to this Post  Reply with Quote 
Re: OSIS questions

This is great!

To indicate word stress, I am using a combining acute accent (U+0301), which is indicated for such use by Unicode:

"Diacritic marks are not encoded by function, and are not specific to language or usage. For example, look at the acute accent. In some languages, it is a diacritic to indicate a distinct letter (with a distinct pronunciation); in other languages it marks a stress, or a quantity; in others it marks a tone. The implications for linguistic processing (including sorting) may be different in each case."

Also see:
http://en.wikipedia.org/wiki/Acute_accent#Stress

Currently, acute accent is treated as a separate character by the Sword search function.


As I indicated above, it is possible to implement an OSIS tag for stress that would be used for display purposes but will be ignored for searches. For example, a dictionary program Lingvo has a stress tag in its DSL format precisely for this.
----------------------------------------
[Edit 1 times, last edit by ua4ever at Dec 1, 2008 5:06:06 PM]
[Dec 1, 2008 5:01:35 PM] Show Printable Version of Post    View Member Profile    Send Private Message [Link] Report threatening or abusive post: please login first  Go to top 
Show Printable Version of Thread  Post new Thread