[sword-devel] Turkish to English glossary problem

DM Smith dmsmith at crosswire.org
Tue Jan 5 14:16:36 MST 2016


Caleb,
I was just typing up instructions on how to get it via the Module Manager.

I agree that the module is bad. I’ll take your word that it’s really bad, less than useless. I’m not sure how useful glossaries and dictionaries are in general to our users. Looking at the download statistics, they aren’t a hot ticket item.

The lookup takes your input, changes it to upper case (probably badly as Turkish has different rules for upper casing, e.g. dotted lower case i, if I remember) and looks for the nearest match in binary search order. When it doesn’t find it in the list, it returns the nearest to it. At least that is the intention.


DM

> On Jan 5, 2016, at 1:56 PM, Caleb Maclennan <caleb at alerque.com> wrote:
> 
> Disregard about the module, I found it in a different section of the module manager. So I have the ERtr_en module now, but as far as I can figure in Xephos it's useless. Turkish is an agglutinated language and almost no words in an actual text like the Bible appear in their root or stem form as found in a dictionary. Ergo no words (except a handful of conjunctions, numbers, etc. that sometimes have no suffixes) you click on to look up in the dictionary even have a chance of coming up with an actual meaning. Even if you know how to parse words and type the stems into the dictionary lookup bar, it rarely has them and throws the closest match (Alphabetical? Levenshtein distance?) which is less than useless.
> 
> Unless I'm missing something, it might be just as well to disable the module as insult anybody that tries to use it with data this useless. Am I missing something? Is there a use-case that makes it worth trying to cleanup the character set issue? I'll still look into it if you say it's worth some time to do.
> 
> Caleb
> 
> On Tue, Jan 5, 2016 at 8:39 PM, Caleb Maclennan <caleb at alerque.com <mailto:caleb at alerque.com>> wrote:
> DM,
> 
> Honestly I'm willing to put some effort into this if it will be beneficial to anybody using Turkish scriptures, but the Wayback Machine link you provided is not encouraging. Not only is the encoding garbage, but the data itself is rife with mistakes.Not a full minute of skimming it and I found several misspelled Turkish words (not just wrong encoding, actual misspellings) and outright bogus definitions. It's a very low quality data set. Is what's an the page representative of what is going to come out even if I dive down an archaic Windows rabbit hole and manage to surface with a properly encoded list? Is such a dictionary really helping anybody? It doesn't seem to have much in the way of Biblical/theological terminology anyway. Is this just for looking up word definitions in while reading a text or does it serve some purpose for cross referencing translations?
> 
> I have a copy of Xiphos handy, but for some reason Turkish isn't showing up in the dictionary modules available for download. Is this not in the default CrossWire repo?
> 
> Caleb
> 
> On Tue, Jan 5, 2016 at 8:11 PM, DM Smith <dmsmith at crosswire.org <mailto:dmsmith at crosswire.org>> wrote:
> Thanks Caleb,
> 
> I’m working on JSword which is the Java version of the SWORD engine. As such I run all the modules I can get my hands on through a process that reads all of each module reporting what it cannot handle. It was that effort that made me look closer at the module. Either the problem was in JSword or it was in the module.
> 
> With Peter, David and your input, we can safely say that it is the module’s problem.
> 
> Most front-ends don’t display the module as a list (i.e. browse the contents). Bible Desktop does. Most front-ends allow you to select a word and look it up in a dictionary. The Glossary modules allow you to look up a word in one language and display it in another. Bible Desktop doesn’t.
> 
> If you let us know which front-end you use, we can explain how to download the module for it and how to use it in that program.
> 
> The SWORD utility mod2imp will dump a module’s content in imp format. Since this module is a RawLD module, the *dat file is readable. In your modules folder it would be: modules/lexdict/rawld/glossaries/ertr_en/ertr_en.dat. The ertr_en.idx file is not readable as it is in a proprietary binary format.
> 
> While it certainly is possible to take the dump from mod2imp, edit it and rebuild the module, we prefer not to do that. What is best is to get the source again and create a module from it. And if the source was not the original location, it is best to identify the original and get it from there. In the case of our source, we got it from:
> http://www.wordgumbo.com/al/tur/ertureng.htm <http://www.wordgumbo.com/al/tur/ertureng.htm>
> Currently this site is down, so I found it via the Internet Wayback Machine:
> https://web.archive.org/web/20131124010613/http://www.wordgumbo.com/al/tur/ertureng.htm <https://web.archive.org/web/20131124010613/http://www.wordgumbo.com/al/tur/ertureng.htm>
> 
> I noted that WordGumbo sourced the files from Ergane. That is the originator of the data and it can be found here:
> http://download.travlang.com// <http://download.travlang.com//>
> 
> Ergane is software that runs under Windows only. It doesn’t run under Windows 10 (64-bit). I haven’t tried Windows 7 (64-bit). The software requires various zips to be installed to be useful. I downloaded one of the zip files and it contained an MDB file, which I’m pretty sure is a Windows database file, perhaps Access. From the website’s description of the program:
> 
>> Ergane is a multilingual <http://users.nccs.gov/~rickyk/scicomp/> translation dictionary for Windows that uses the artificial language Esperanto to translate words and short expressions from one natural language to another. Ergane is a product of Majstro Aplikaĵoj <http://www.majstro.com/Bedrijf/contact_eng.html>.
> 
> 
> and
>> You won't need a masters in computer science <https://cisonline.bu.edu/master-of-science-in-computer-information-systems/> to download Ergane ,but make sure you do have Windows. 
>> 
>> Windows 95 or higher.
>> 
> 
> Ideally, the output of the program for the Turkish to English needs to be obtained from it, converted into UTF-8, if it isn’t and a module source file created for it. Proof-reading is invaluable.
> 
> Let us know what you are willing to do.
> 
> In Him,
> 	DM
> 
>> On Jan 5, 2016, at 12:28 PM, Caleb Maclennan <caleb at alerque.com <mailto:caleb at alerque.com>> wrote:
>> 
>> Hey DM,
>> 
>> I am fluent in Turkish and can help out here. That being said I'm a little confused what you're into here. Can you point me at where to see the source files for this in context and where it comes out in an app?
>> 
>> It looks from the bits you pasted like a file somewhere along the line got read and interpreted with the wrong code-page. Of the text you pasted, all of it is wrong, but it is all off with a 1-to-1 character transpose that could make it right. All the "O"s are "İ" and all the "1"s are "I" in the dictionary list for example.
>> 
>> Caleb
>> 
>> On Tue, Jan 5, 2016 at 4:56 PM, DM Smith <dmsmith at crosswire.org <mailto:dmsmith at crosswire.org>> wrote:
>> Does anyone know Turkish that can help figure out a problem I am having?
>> 
>> Background: In ASCII the first 32 characters (00 to 1F) are control characters and most are not valid for XML, but are valid for UTF-8.
>> 
>> In one of our modules, ERtr_en, I am seeing data such as:
>> For the 26th entry, the entry looks like
>> 
>> AUSTOS	1. August<br />
>> 
>> However, the key AUSTOS has a non-printable between A and U of the control character with the hex value 1F:
>> ‘A’ ‘1F’ ‘U’ ’S’ ’T’ ‘O’ ’S’
>> 
>> What is the correct value?
>> 
>> Note: There are hundreds of such problems in this module. And I’m seeing such non-printables in many other modules from the same source (wordgumbo.com <http://wordgumbo.com/>).
>> 
>> For those that are interested, here are the first entries in the dictionary, none of which see right to me (ran a few of the definitions through google translate):
>> index	offset	size	key	value
>> 0	33132	22	0NCIL	1. Bible<br />
>> 1	33156	72	0NGILIZ	1. English<br />2. Englishman; Sassenach...
>> 2	33260	32	0NGILIZ KAM1_1	1. bamboo<br />
>> 3	33230	28	0NGILIZCE	1. English<br />
>> 4	33294	44	0NGILTERE	1. England<br />2. England<br />
>> 5	33340	28	0RAN	1. Iran; Persia<br />
>> 6	33370	25	0RANL1	1. Iranian<br />
>> 7	33397	26	0RLANDA	1. Ireland<br />
>> 8	33425	43	0RLANDAL1	1. Irish<br />2. Irishman<br />
>> 9	33470	21	0SA	1. Christ<br />
>> 10	33493	22	0SLAM	1. Islam<br />
>> 11	33517	24	0SPANYA	1. Spain<br />
>> 12	33543	28	0SPANYOL	1. Spaniard<br />
>> 13	33573	39	0SRAIL	1. Israel<br />2. Israel<br />
>> 14	33614	28	0STANBUL	1. Istanbul<br />
>> 15	33644	24	0SVEÇ	1. Sweden<br />
>> 16	33670	41	0SVEÇLI	1. Swedish<br />2. Swede<br />
>> 17	33713	31	0SVIÇRE	1. Switzerland<br />
>> 18	33746	41	0SVIÇRELI	1. Swiss<br />2. Swiss<br />
>> 19	33789	23	0TALYA	1. Italy<br />
>> 20	33814	42	0TALYAN	1. Italian<br />2. Italian<br />
>> 21	33858	44	0TALYANCA	1. Italian<br />2. Italian<br />
>> 22	33904	26	0ZLANDA	1. Iceland<br />
>> 23	33086	20	1L1K	1. warm<br />
>> 24	33108	22	1RMAK	1. river<br />
>> 25	7062	25	AUSTOS	1. August<br />
>> 
>> 
>> Thanks in advance!
>> 
>> In Him,
>> 	DM Smith
>> 
>> 
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
>> http://www.crosswire.org/mailman/listinfo/sword-devel <http://www.crosswire.org/mailman/listinfo/sword-devel>
>> Instructions to unsubscribe/change your settings at above page
>> 
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
>> http://www.crosswire.org/mailman/listinfo/sword-devel <http://www.crosswire.org/mailman/listinfo/sword-devel>
>> Instructions to unsubscribe/change your settings at above page
> 
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
> http://www.crosswire.org/mailman/listinfo/sword-devel <http://www.crosswire.org/mailman/listinfo/sword-devel>
> Instructions to unsubscribe/change your settings at above page
> 
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20160105/a6de61d7/attachment-0001.html>


More information about the sword-devel mailing list