Mailing List Archive

Re: Google Translate now assists with human translations of Wikipedia articles
On Tue, Jun 9, 2009 at 3:13 PM, Amir E. Aharoni wrote:


> Machine translation in its current status is so useless for anything
> beyond ordering Opera Garnier tickets, that the copyright status of
> its output is not quite relevant and i don't expect this to change in
> the next fifty years.
>
Brian wrote:
> On what basis do you make this extremely negative assessment?
>
> Readability is the the same thing as ability to read.
>
>
No, readability is the ability to BE read.

For the most part machine translation is rarely reliable, and often
hilarious.

Ec

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
Amir E. Aharoni wrote:
> On Tue, Jun 9, 2009 at 23:42, Brian<Brian.Mingus@colorado.edu> wrote:
>> Google has built in support for using its machine translation technology to
>> help bootstrap human translations of Wikipedia articles.
>>
>> http://translate.google.com/toolkit/docupload
>>
>> The benefit to Google is clear - they need sentence-aligned text in multiple
>> languages in order to bootstrap their automated system.
>>
>> This is a great example of machines helping people help machines help
>> people, etc... I'm sure this is now the most efficient way to produce high
>> quality translations of Wikipedia articles en masse.
>>
>> We should take the ToS to make sure the translated text can be CC-BY-SA
>> licensed.
>
> OK, after a bit of drama in this discussion, i actually tried this toolkit.
>
> Then i tried to translate [[Art critic]] from English into Hebrew.
> There were a few pleasant surprises, but on the whole the machine
> translation was bad to the point of being unusable. It is much easier
> to translate it using vi.

I tried translating [[Astronomy]] and [[Eothyrididae]] (at least, the
part of it that is in English) to Serbian and was pleasantly surprised.
Sure, literally every sentence needed major corrections, but for me it
was still much easier to do that than to translate from scratch.

> I *had* to make very deep changes to paragraph structure - not to
> mention sentence structure -, and not just because the Hebrew
> Wikipedia has a different MOS, but because it's the basis of the

This is then apparently the case of English→Hebrew translation working
worse than English→Serbian (possibly due to Hebrew being a
non-indo-european language)? I have never had to make any changes to
paragraph structure, only occasionally changes to sentence structure
(I'd say there were about 10% of sentences I had to change the structure
of and another 10% that had uncommon structure but I let them slide).

> Hebrew language. A text without these changes would be next to
> unreadable. I doubt that a document which is changed so deeply is very

While I would probably delete an article that would be dumped straight
from a machine translation, I still find it fully understandable.

To illustrate:

Then i tried to translate [[Art critic]] from English into Hebrew.
There were a few pleasant surprises, but on the whole the machine
translation was bad to the point of being unusable. It is much easier
to translate it using vi.

translates to:

Tada sam pokušao prevesti [[umetnički kritičar]] sa engleskog na hebrejskom.
Bilo je nekoliko ugodnih iznenađenja, nego na ceo mašina
prevod je loš do tačke da je neupotrebljiva. To je mnogo lakše
prevesti preko VI.

I would retranslate this to broken English li:

Then i tried to translate [[Art critic]] from English into Hebrew's.
There were a few pleasant surprises, than on entire machine's
translation was bad to the point of being unusably. Much easier
translated via VI.

and the correct would be (I highlighted the changes):

Tada sam pokušao prevesti [[umetnički kritičar]] sa engleskog na
*hebrejski*.
Bilo je nekoliko ugodnih iznenađenja, *ali u celini* *mašinski*
prevod je loš do tačke da je *neupotrebljiv*. *Mnogo je* lakše
prevesti *ga* *pomoću vi-ja*.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
Kalan wrote:
> present for cyrillic languages. Statistical approach sometimes
> discovers false connections that result in factual errors. Examples of
> “translating”, say, “50 USD” as “50 000 UAH” within a particular
> context are known; more of such things can arise unexpectedly. So, at

The funniest example I noticed is that "flew" was translated to Serbian
as "MaudDib" :) (this has been corrected since).

And yet I can not stress enough how much I find this service useful,
both for personal use and to ease translation.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
Sometimes cities are "translated" - "Koper" was translated to English
from Slovene as "Chicago" and "Kranj" as "Miami"... of course Kranj is
100km inland and Miami is largely beachfront and the opposite with
Chicago and Koper.

"Ljubljana" was translated to English in earlier phases of the
software as "rape"... In Italian to English, "L'Italia" became
"Canada"; in Tagalog to English, "Pilipinas" became "Japan" - when
they first debuted the Tagalog language capability, I tested it with
the tl.wp article on Manila which informed me that Manila is the
capital of Japan...

Mark

On Wed, Jun 10, 2009 at 7:33 AM, Nikola Smolenski<smolensk@eunet.yu> wrote:
> Kalan wrote:
>> present for cyrillic languages. Statistical approach sometimes
>> discovers false connections that result in factual errors. Examples of
>> “translating”, say, “50 USD” as “50 000 UAH” within a particular
>> context are known; more of such things can arise unexpectedly. So, at
>
> The funniest example I noticed is that "flew" was translated to Serbian
> as "MaudDib" :) (this has been corrected since).
>
> And yet I can not stress enough how much I find this service useful,
> both for personal use and to ease translation.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
Brian wrote:
> Of course these are now things that you are able to fix and which can be
> shared with everyone.
>

Sure, the funny errors are the most obvious and most easily fixed. The
problematic ones are more subtle, remain unnoticed, and more readily
spread misunderstanding.

Ec
> On Wed, Jun 10, 2009 at 9:32 AM, Mark Williamson <node.ue@gmail.com> wrote:
>
>> Sometimes cities are "translated" - "Koper" was translated to English
>> from Slovene as "Chicago" and "Kranj" as "Miami"... of course Kranj is
>> 100km inland and Miami is largely beachfront and the opposite with
>> Chicago and Koper.
>>


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
Machine translations are not new work, neither derivatives, as it is
done by machines and not by humans.

Also Google will have a hard time claiming that because some
unidentified person added text or an url to a open service they now has
the right to do whatever they want with the text.

I guess what they try to say in the TOS is that the text will be used to
build the statistical engine and you give Google the right to do so.
That is, they provide the translation and you provide the corrections
which is then released to them.

John

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
Compare such text to a photo of a painting changed by some automatic
algorithm. The copyright of the painting is unchanged and the algorithm
gets no part of any new copyright, yet the person applying the tool
_can_ have a part in the copyright for the new derived work.

If you translate a work through the use of some tool, the tool gets no
part of the copyright, the person may get a part of the copyright for
the derived work but then he must do something in addition to running
the tool, unless the tool is so extremely difficult to use that running
it is sufficient.

John

John at Darkstar skrev:
> Machine translations are not new work, neither derivatives, as it is
> done by machines and not by humans.
>
> Also Google will have a hard time claiming that because some
> unidentified person added text or an url to a open service they now has
> the right to do whatever they want with the text.
>
> I guess what they try to say in the TOS is that the text will be used to
> build the statistical engine and you give Google the right to do so.
> That is, they provide the translation and you provide the corrections
> which is then released to them.
>
> John
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
On Wed, Jun 10, 2009 at 11:57 PM, John at Darkstar <vacuum@jeb.no> wrote:

> Machine translations are not new work, neither derivatives, as it is
> done by machines and not by humans.


This is probably the correct argument to make.
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
There are two trends in machine translations; rule based translations
and statistical translations. Both have pros and cons. Rule based
translations seems to be possible to integrate with Wiktionary in such a
way that it can support Wikipedia. Statistical translations seems to be
possible to integrate more directly with Wikipedia. Both methods can use
the history of the translated article to identify where the translation
engine fails; for a rule based translation engine that usually means
there are some missing transfer rules, for a statistical translation
engine that means the engine has failed to adapt to some type of sentence.

Google previously used Systrans engine, but now uses their own. Sort of,
there are some rumors about them using a open source statistical
translation engine.
http://googlesystem.blogspot.com/2007/10/google-translate-switches-to-googles.html

Microsoft also uses a statistical translation engine.
http://blogs.msdn.com/translation/archive/2008/08/22/statistical-machine-translation-guest-blog.aspx

One very promising free rule based translation engine is Apertium
http://wiki.apertium.org/wiki/Main_Page

A very well known free statistical engine is Moses
http://www.statmt.org/moses/



_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
Sorry for my english, its actually not a machine translation even if it
looks like that! ;p

John

John at Darkstar skrev:
> There are two trends in machine translations; rule based translations
> and statistical translations. Both have pros and cons. Rule based
> translations seems to be possible to integrate with Wiktionary in such a
> way that it can support Wikipedia. Statistical translations seems to be
> possible to integrate more directly with Wikipedia. Both methods can use
> the history of the translated article to identify where the translation
> engine fails; for a rule based translation engine that usually means
> there are some missing transfer rules, for a statistical translation
> engine that means the engine has failed to adapt to some type of sentence.
>
> Google previously used Systrans engine, but now uses their own. Sort of,
> there are some rumors about them using a open source statistical
> translation engine.
> http://googlesystem.blogspot.com/2007/10/google-translate-switches-to-googles.html
>
> Microsoft also uses a statistical translation engine.
> http://blogs.msdn.com/translation/archive/2008/08/22/statistical-machine-translation-guest-blog.aspx
>
> One very promising free rule based translation engine is Apertium
> http://wiki.apertium.org/wiki/Main_Page
>
> A very well known free statistical engine is Moses
> http://www.statmt.org/moses/
>
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
On Thu, Jun 11, 2009 at 09:37, John at Darkstar<vacuum@jeb.no> wrote:
> Google previously used Systrans engine, but now uses their own. Sort of,
> there are some rumors about them using a open source statistical
> translation engine.
> http://googlesystem.blogspot.com/2007/10/google-translate-switches-to-googles.html

I couldn't find those rumors at the link you gave. Where did you see them?

That would be interesting. If it is open source, Wikipedia can just
use it, and more importantly - improve it, by itself, without Google's
help

--
אמיר אלישע אהרוני
Amir Elisha Aharoni

http://aharoni.wordpress.com

"We're living in pieces,
I want to live in peace." - T. Moore

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
The link is about Google Translate, I'm not sure about the rumor.

Probably a rule based solution is the easiest to get up and running for
small wikis, while a statistical solution will work for larger wikis.
That will make the system work sufficiently well that users will build
upon the initial machine translation thereby enabling the statistical
engine to learn from the errors. Its like an automatic classifier with
some a priori knowledge.

John

Amir E. Aharoni skrev:
> On Thu, Jun 11, 2009 at 09:37, John at Darkstar<vacuum@jeb.no> wrote:
>> Google previously used Systrans engine, but now uses their own. Sort of,
>> there are some rumors about them using a open source statistical
>> translation engine.
>> http://googlesystem.blogspot.com/2007/10/google-translate-switches-to-googles.html
>
> I couldn't find those rumors at the link you gave. Where did you see them?
>
> That would be interesting. If it is open source, Wikipedia can just
> use it, and more importantly - improve it, by itself, without Google's
> help
>

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Google Translate now assists with human translations of Wikipedia articles [ In reply to ]
FYI,


Don't know if this is relevant....


Gordo


>>>>>>>>>>>>>>>

>
>From: Allen Gunn <gunner@aspirationtech.org>
>To: "icommons@lists.ibiblio.org" <icommons@lists.ibiblio.org>
>X-Enigmail-Version: 0.95.7
>Subject: [Icommons] Open Translation Tools 2009 - Call for Participants



>
>Howdy iCommons friends,
>
>If you are involved with the open source tools and distributed processes
>behind the translation of open content, we'd love you to consider
>joining us in Amsterdam in late June for Open Translation Tools 2009.
>
>And please help us spread the word to those who might be interested -
>blog it, post it to other lists, tweet it, Facebook it. We thank you for
>your help in bringing together people passionate about the translation
>of open knowledge.
>
>And a shout-out to Ahrash Bissell, who has been wonderfully supportive
>in helping us shape the vision for the event.
>
>Full event blurbage is pasted below, and also available at
>
>http://www.aspirationtech.org/events/opentranslation/2009
>
>We hope to see you in Amsterdam at the end of June!
>
>thanks & peace,
>gunner
>
>-----
>
>Open Translation Tools 2009 - Call for Participants!
>
>http://www.aspirationtech.org/events/opentranslation/2009
>
>Aspiration is delighted to announce Open Translation Tools 2009 (OTT09),
>to be held in Amsterdam, The Netherlands, from 22-24 June, 2009. The
>event will be followed by an Open Translation "Book Sprint" which will
>produce a first-of-its-kind volume on tools and best practices in the
>field of Open Translation. Both events are being co-organized in
>partnership with FLOSSManuals.net and Translate.org.za, and generously
>supported by the Open Society Institute.
>
>Agenda partners for the event include Creative Commons, Global Voices
>Online, WorldWide Lexicon, Meedan, and DotSUB.
>
>OTT09 will build upon the work and collaboration from Open Translation
>Tools 2007 (http://www.aspirationtech.org/events/opentranslation). The
>event will convene stakeholders in the field of open content translation
>to assess the state of software tools that support translation of
>content that is licensed under free or open content licenses such as
>Creative Commons or Free Document License. The event will serve to map
>out what's available, what's missing, who's doing what, and to recommend
>strategic next steps to address those needs, with a particular focus on
>delivering value to open education, open knowledge, and human rights
>blogging communities.
>
>Primary focus will be placed on supporting and enabling distributed
>human translation of content, but the role of machine translation will
>also be considered. "Open content" will encompass a range of resource
>types, from educational materials to books to manuals to documents to
>blog content to video and multimedia.
>
>We invite all prospective participants to answer the Open Translation
>2009 Call for Participants.
>
>The agenda goals of the 2009 event will be several:
>
>* Addressing the Translation Challenges Faced by the Open Education,
>Open Content, and human rights blogging communities, and mapping
>requirements to available open solutions.
>* Building on the vision and exploring new use cases for the Global
>Voices Lingua Translation Exchange
>* Documenting the state of the art in distributed human translation, and
>discussing how to further tap the tremendous translation potential of
>the net
>* Making tools talk better: realizing a standards-driven approach to
>open translation
>* Exploring and sketching out Open Translation API Designs, building on
>existing work and models
>* Documenting workflow requirements for missing open translation tools
>* Match-making between open source tools and open content projects
>* Mapping of available tools to open translation use cases
>
>See the Agenda Overview
>(http://www.aspirationtech.org/events/opentranslation/2009/agenda/overview)
>for elaboration and more details about what is being planned.
>
>Most importantly, the agenda will center on the needs and knowledge of
>the participating projects, structuring sessions and collaborations to
>focus on designing appropriate processes and selecting appropriate tools
>to support open content projects and inform further development of open
>source translation tools.
>
>In addition, OTT09 will continue the knowledge sharing for the open
>translation community, and continue discussion on other identified needs
>from OTT07. The agenda for this event will be greatly informed by open
>education, open content and human rights blogging projects with specific
>translation needs, and a number of sessions will be structured to both
>characterize requirements and propose solutions to respective projects'
>translation requirements.
>
>OTT07 mapped out a hefty list Open Translation Tools
>(http://www.aspirationtech.org/papers/ott07/tools). Participants at
>OTT09 will survey what has changed over the past 18 months, and assess
>the most pressing remaining gaps.
>
>If OTT09 sounds like your kind of event, we invite you to answer the
>Open Translation 2009 Call for Participants!
>
>http://www.aspirationtech.org/events/opentranslation/2009
>
>--
>Allen Gunn
>Executive Director, Aspiration
>+1.415.216.7252
>www.aspirationtech.org
>
>Aspiration: "Better Tools for a Better World"
>
>_______________________________________________
>Icommons mailing list
>Icommons@lists.ibiblio.org
>http://lists.ibiblio.org/mailman/listinfo/icommons



--
"Think Feynman"/////////
http://pobox.com/~gordo/
gordon.joly@pobox.com///

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l