Mailing List Archive

Feature idea: data structure to improve translation capabilities
Hello,

I have been thinking of a way to organise data in Wiktionary that would allow
for words to automatically show translations to other languages with much less
work than is currently required.

Currently, translations to other languages have to be added manually, meaning
they are not automatically propagated across language pairs. What I mean by
this is showcased in the following example:

1. I create a page for word X in language A.
2. I create a page for word Y in language B.
3. I add a translation to the page for word X, and state that it translates to
word Y in language B.
4. If I want the page for word Y to show that it translates to word X in
language A, I have to do this manually.

Automating this seems a bit tricky. I think that the key is acknowledging that
meanings can be separated from language and used as the links of translation.
In this view, words and their definitions are language-specific, but meanings
are language-agnostic.

Because I may have done a bad job at explaining this context, I have created a
short example in the form of an sqlite3 SQL script that creates a small
dictionary database with two meanings for the word "desert"; one of the
meanings has been linked to the corresponding words in Spanish and in German.
The script mainly showcases how words can be linked across languages with
minimal rework.

You can find the script attached. To experiment with this, simply run

.read feature_showcase.sql

within an interactive sqlite3 session. (There may be other ways of doing it
but this is how I tested it.)

I believe this system can also be used to automate other word relations such as
hyponyms and hypernyms, meronyms and holonyms, and others. It can also allow
looking up words in other languages and getting definitions in the language of
choice. In short, it would allow Wiktionary to more effortlessly function as
a universal dictionary.

Has something like this been suggested before? I would be pleased to receive
feedback on this idea.

With kind regards,
Wolter HV
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Feature idea: data structure to improve translation capabilities [ In reply to ]
Off hand isn’t this something that wikidata was setup to handle?

On Sun, Jun 20, 2021 at 12:40 PM Wolter HV <wolterhv@gmx.de> wrote:

> Hello,
>
> I have been thinking of a way to organise data in Wiktionary that would
> allow
> for words to automatically show translations to other languages with much
> less
> work than is currently required.
>
> Currently, translations to other languages have to be added manually,
> meaning
> they are not automatically propagated across language pairs. What I mean
> by
> this is showcased in the following example:
>
> 1. I create a page for word X in language A.
> 2. I create a page for word Y in language B.
> 3. I add a translation to the page for word X, and state that it
> translates to
> word Y in language B.
> 4. If I want the page for word Y to show that it translates to word X in
> language A, I have to do this manually.
>
> Automating this seems a bit tricky. I think that the key is acknowledging
> that
> meanings can be separated from language and used as the links of
> translation.
> In this view, words and their definitions are language-specific, but
> meanings
> are language-agnostic.
>
> Because I may have done a bad job at explaining this context, I have
> created a
> short example in the form of an sqlite3 SQL script that creates a small
> dictionary database with two meanings for the word "desert"; one of the
> meanings has been linked to the corresponding words in Spanish and in
> German.
> The script mainly showcases how words can be linked across languages with
> minimal rework.
>
> You can find the script attached. To experiment with this, simply run
>
> .read feature_showcase.sql
>
> within an interactive sqlite3 session. (There may be other ways of doing
> it
> but this is how I tested it.)
>
> I believe this system can also be used to automate other word relations
> such as
> hyponyms and hypernyms, meronyms and holonyms, and others. It can also
> allow
> looking up words in other languages and getting definitions in the
> language of
> choice. In short, it would allow Wiktionary to more effortlessly function
> as
> a universal dictionary.
>
> Has something like this been suggested before? I would be pleased to
> receive
> feedback on this idea.
>
> With kind regards,
> Wolter HV
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
Re: Feature idea: data structure to improve translation capabilities [ In reply to ]
[2021-06-20 17:39 +0100] Wolter HV:
> You can find the script attached. To experiment with this, simply run
>
> .read feature_showcase.sql
>
> within an interactive sqlite3 session. (There may be other ways of doing it
> but this is how I tested it.)

I found out, unsurprisingly, that my attachment didn't make it into the mailing
list :D

Here is a pastebin link with the aforementioned sqlite3 SQL script:

https://paste.gnome.org/pca7e7y0v

Regards,
Wolter HV
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Feature idea: data structure to improve translation capabilities [ In reply to ]
[2021-06-20 17:43 +0100] John:
> Off hand isn’t this something that wikidata was setup to handle?

I'm not sure, but I don't see the functionality currently being there in
Wiktionary. Is this something currently under development?

Regards,
Wolter HV
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Feature idea: data structure to improve translation capabilities [ In reply to ]
Hi!

On Wed, Jun 23, 2021 at 11:05 AM Wolter HV <wolterhv@gmx.de> wrote:

> [2021-06-20 17:43 +0100] John:
> > Off hand isn’t this something that wikidata was setup to handle?
>
> I'm not sure, but I don't see the functionality currently being there in
> Wiktionary. Is this something currently under development?
>

Yeah, for something like fifteen years, I guess… :-) See e.g. OmegaWiki
(formerly known as WiktionaryZ).

The modern incarnation of machine-readable dictionary is the
Lexicographical Data project on Wikidata. It is a nice project, definitely
go take a look at it, but it is not really an evolution/improvement of
Wiktionary but rather a fresh start. (Among other reasons because of the
license incompatibility of Wiktionary’s CC-BY-SA with Wikidata’s CC-0.) See
https://www.wikidata.org/wiki/Wikidata:Lexicographical_data

-- [[cs:User:Mormegil | Petr Kadlec]]
Re: Feature idea: data structure to improve translation capabilities [ In reply to ]
[2021-06-23 10:56 +0100] petr:
> Hi!

Hi! Thanks for your reply.

> Yeah, for something like fifteen years, I guess… :-) See e.g. OmegaWiki
> (formerly known as WiktionaryZ).

OmegaWiki is, if not exactly, astoundingly near what I was proposing. It links words to meanings and automatically derives translations from that, which is the main feature I was looking for. It also supports linking words with one another with different relationships like hypo- and hypernimic. I wonder why it isn't more popular.

> The modern incarnation of machine-readable dictionary is the
> Lexicographical Data project on Wikidata. It is a nice project, definitely
> go take a look at it, but it is not really an evolution/improvement of
> Wiktionary but rather a fresh start. (Among other reasons because of the
> license incompatibility of Wiktionary’s CC-BY-SA with Wikidata’s CC-0.) See
> https://www.wikidata.org/wiki/Wikidata:Lexicographical_data

Thanks, this is interesting too, though this project doesn't seem to decouple meanings from words, so automatic translations don't work with it (as far as I could see from my short snoop-around.)

I'll stick to OmegaWiki and hopefully add my grain of salt to it. Thanks for bringing that to the conversation!

Regards,
Wolter HV
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Feature idea: data structure to improve translation capabilities [ In reply to ]
Hi,

On Sun, Jul 4, 2021 at 2:49 AM Wolter HV <wolterhv@gmx.de> wrote:

> Thanks, this is interesting too, though this project doesn't seem to
> decouple meanings from words, so automatic translations don't work with it
> (as far as I could see from my short snoop-around.)
>

I’m not sure what you mean with “decouple meanings from words”. Sure,
lexemes themselves do not have DefinedMeaning entries like OmegaWiki does,
but note this is a part of Wikidata, and lexeme senses are linked with
main-namespace Wikidata items using the “item for this sense” (P5137)
property. See e.g. https://www.wikidata.org/wiki/Lexeme:L10984 linking
the “point
of entry to an enclosed space” sense to https://www.wikidata.org/wiki/Q53060
and also to the corresponding lexemes in other language(s), in this case,
https://www.wikidata.org/wiki/Lexeme:L406305#S1

HTH,
-- [[cs:User:Mormegil | Petr Kadlec]]