Mailing List Archive

African Languages Wikipedia Bashing on Slashdot
If someone can identify one or two native speakers for each African
Language who are willing to spend a couple of weeks
putting together lexicons for Wikitrans for each target language, along
with language synthesis rules, I can start performing
runs for the target laguages. I will need the following:

rogets thesaurus (open version on my website) lexicon
transposition of english words and phrases for the Cherokee lexicons
into these languages.

I already have the ability to synthesize new words for advanced latin
derived scientific language which will speed the creation
of a full African Language Wikipedia with the various languages. After
the translation runs, then there should be a large enough
body of matrerials to start getting a community around it. Starting at
ground zero with groups of people who need food more than
computers will certainly relegate the project to failure from the very
start.

Most of these African languages are going to have similiar challenges to
Native languages in American in that they will not
have evolved modern words for modern concepts.

Also, while folks I notice talk a lot about solutions, which is a good
thing, we will need someone here to take some action and get
these speakers to contact me and get these lexicons put together. I
will also need to construct a rules database with the AI engine
and about 20 or so articles from the runs taken and retensed and
corrected to teach the AI engine how to reorder text and phrasing
for these languages into a readable form.

If the Foundation wants a good test run of WikiTrans on these languages,
this would be an excellent project to get Wikipedia
converted to these languages rather than waiting years (or maybe never)
to get it done. I just read the Slashdot article and they
are bashing the heck out of us over this program announcement.

We have rosetta's stone to use, let's use it. I need the folks doing
the African langauges program to shoot me an email, and I will
give them instructions and we can get them at least a good starting
point of content that will require 8,000,000+ articles on wikibooks
and wikipedia to be proofread, but its better than starting from 0.

I am not open sourcing the translator at this time, but I will assist in
creation of lexicons, rules, syntax, and grammar databases and parsers
and perform and post translation runs into any of these languages to
provide a starting point for wikipedia. I am about 5 years ahead of the
game
with solid written tools moulded around MediaWiki that already does all
of this. Let's apply them to this program and just get the thing done and
shutup these nay saying folks claiming we cannot pull it off -- we can.

Jeff


_______________________________________________
foundation-l mailing list
foundation-l@wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/foundation-l
African Languages Wikipedia Bashing on Slashdot [ In reply to ]
Regarding Jeffrey Merkey's earlier post, with all respect, the issue of
machine translation is not one that can be addressed in a few weeks with
a couple of native speakers. This isn't the forum to discuss the
nitty-gritty of machine translation issues, however, other than to say
that the quality of Wikipedia entries is much more important than the
quantity, and the only real path to quality Wikipedia entries in African
languages is through real human labor.

The Slashdot discussion is interesting, mostly in what it reveals about
the state of knowledge (or lack thereof) in the tech world about most
things African. Many /.ers write with the attitude that, because
African languages don't matter to them, they don't matter. These are my
comments following on the Slashdot riffs on the article.

The recurring theme of the /. conversation is, why should people waste
their time creating African language Wikipedias if the languages have
low literacy and few computer users? However, the original NYT article
was written about a discussion that has moved well beyond that level.
The questions that the people working on African language Wikipedias
(who have a new discussion list,
http://groups.yahoo.com/group/afrophonewikis ) are asking are more like
these:

* Can some of Africa's entrenched economic difficulties relate to
the fact that many of her people do not have access to literacy in
the languages they speak and use on a daily basis?
* How much of the lack of literacy in many languages is related to
the lack of a systematic effort to produce written materials in
those languages?
* If a critical mass of written materials were produced for a given
language, would it create the necessary foundation for widespread
literacy in that language among speakers of that language?
* If speakers of a given language were to develop literacy in that
language, rather than having to learn an entirely different
language (such as English or Arabic) in order to engage in written
communications (send emails, write blogs, read newspapers, get
commodity market and weather reports relevant to the crops they
grow, apply for jobs, evaluate the truth claims of politicians,
etc), might that literacy be a key to overcoming the continent's
persistent economic difficulties?
* Given the certified failure of print publishers and government
agencies (colonial and post-colonial) to produce literacy
materials in most African languages during the past 150 years, and
the rapid success of the Wikipedia model in producing vast amounts
of knowledge material quickly, might the resources of the
Wikipedia world be a way to address the issues of creating
literacy materials for those languages?
* If One Laptop Per Child is indeed a foreseeable reality, and if
Wikipedia is going to come prebundled, and if having literacy
materials in the language a child speaks is a key to the ultimate
success and usefulness of OLPC, isn't creating a good Wikipedia in
that child's language an issue of somewhat immediate concern?
* If any or all of the above, but also given the slow pace of
African language Wikipedias to date, what have the barriers been
thus far, and how can those barriers be overcome in a timely and
systematic way?

_______________________________________________
foundation-l mailing list
foundation-l@wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/foundation-l
Re: African Languages Wikipedia Bashing on Slashdot [ In reply to ]
Martin Benjamin wrote:

>Regarding Jeffrey Merkey's earlier post, with all respect, the issue of
>machine translation is not one that can be addressed in a few weeks with
>a couple of native speakers. This isn't the forum to discuss the
>nitty-gritty of machine translation issues, however, other than to say
>that the quality of Wikipedia entries is much more important than the
>quantity, and the only real path to quality Wikipedia entries in African
>languages is through real human labor.
>
>
I completely agree that human beings need to review most of it. I
disagree you will get there
anytime soon without doing it. There's a lot of "save the children"
programs for Africa and most
of them are thin vaneer scams to dupe the ignorant American public into
giving funds which
only a small percentage end up in the hands of the needy for the actual
program. Any program
done here needs to steer clear of being viewed or labeled as such.

I am personally involved in dealing with exactly these issues here in
our own American backyard
so I apologize in advance for not blindly accepting everything you say
here at face value. I deal with
these issues every day with our own people most of whom speak our native
languages at an 8th grade level
since they were force fed english in public schools most of their lives
and our native languages were relegated
to being used in ceremonies at at home (and even that very limited).

People use machine assisted translation every day on Wikipedia -- they
use COMPUTERS to edit articles and
daemons run all the time for internlanguage links, link disambiguation,
and artificial linking. The whole concept
of Wikipedia is a machine assisted collaboration project. If only human
efforts were used, we would be using
pencil and paper and the mail system to collaborate, so machine
translation or machine assisted translation
is the next logical step.

I can see where folks who are intereted in harnessing a large group of
people would be adverse to using it,
but the fact is, you can do both and leverage a machine assisted
translation to form a large base of materials
which would require a lot less work.

Jeff



_______________________________________________
foundation-l mailing list
foundation-l@wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/foundation-l
Re: African Languages Wikipedia Bashing on Slashdot [ In reply to ]
Martin Benjamin wrote:

And here are my responses to your specific questions:

> * Can some of Africa's entrenched economic difficulties relate to
> the fact that many of her people do not have access to literacy in
> the languages they speak and use on a daily basis?
>
>
No, they are related to civil wars, food shortages, health and public
sanitation issues, and contitent
wide governmental corruption and resource exploitation, combined with
backwards racial
conflcits which perist into modern times, including religious conflicts
due to the influences
and battles between Islam and Christianity.

> * How much of the lack of literacy in many languages is related to
> the lack of a systematic effort to produce written materials in
> those languages?
>
>
If a people have no written language and rely on oral tradition, it's
like trying
to ice skate uphill. You will also find (as I have here with several
tribes) many groups have
religous taboos on writing down their languages.

> * If a critical mass of written materials were produced for a given
> language, would it create the necessary foundation for widespread
> literacy in that language among speakers of that language?
>
>
No. You need education programs in each area and immersion schools setup to
provide children from early ages access to materials. Adults won't learn
it, they are too busy
involved in the struggle to just survive day to dat realities.

> * If speakers of a given language were to develop literacy in that
> language, rather than having to learn an entirely different
> language (such as English or Arabic) in order to engage in written
> communications (send emails, write blogs, read newspapers, get
> commodity market and weather reports relevant to the crops they
> grow, apply for jobs, evaluate the truth claims of politicians,
> etc), might that literacy be a key to overcoming the continent's
> persistent economic difficulties?
>
>

Most of these folks will have already been exposed to English. You would
need to identify English
bilingual speakers with one foot in each world to even be able to
communicate WHY this is important.

> * Given the certified failure of print publishers and government
> agencies (colonial and post-colonial) to produce literacy
> materials in most African languages during the past 150 years, and
> the rapid success of the Wikipedia model in producing vast amounts
> of knowledge material quickly, might the resources of the
> Wikipedia world be a way to address the issues of creating
> literacy materials for those languages?
>
>
Machine assisted translations. Wikipedia has succeeded largely in part
due to the young who have little or
no need to go to work every day using their time to contribute and build
it. Wikipedia is a phenomena of
the industrialized western yuppie culture (and very young folks) most of
whom have not yet flown the
nest.

> * If One Laptop Per Child is indeed a foreseeable reality, and if
> Wikipedia is going to come prebundled, and if having literacy
> materials in the language a child speaks is a key to the ultimate
> success and usefulness of OLPC, isn't creating a good Wikipedia in
> that child's language an issue of somewhat immediate concern?
>
>
One laptop per child won't address areas where people worry more about
getting
food to eat or dying of AIDS or some other disease than learning to read
and write.

> * If any or all of the above, but also given the slow pace of
> African language Wikipedias to date, what have the barriers been
> thus far, and how can those barriers be overcome in a timely and
> systematic way?
>
>
literacy, access to MACHINE ASSISTED EDITING TECHNOLOGY.

Jeff

>_______________________________________________
>foundation-l mailing list
>foundation-l@wikimedia.org
>http://mail.wikipedia.org/mailman/listinfo/foundation-l
>
>
>

_______________________________________________
foundation-l mailing list
foundation-l@wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/foundation-l
Re: African Languages Wikipedia Bashing on Slashdot [ In reply to ]
Hi Martin,

I did some research this afternoon and located your swahili dictionary
and a large number of on-line references and lexicons written by others. I
also noticed only 1000+ swahili articles on the sw.wikipedia.org
website. I will spend some time further this week with my spiders
pulling down and constructing swahili lexicons and grammar references
and hopefully in a week or so, I can make some runs against
sw.wikipedia.org as a test. I'll even import the whole of wikipedia for
you to review at my site after I make several machine translation runs
into this language. Won't be perfect, but its better than 1000+ articles
and could be a good basis for a test of this system. I note a lot of
similarities between the projects.

I think after looking over it you may have a different view and will be
able to grasp the possibilities. A lot easier to correct calculus
textbooks for
grammar here and there than to try to convince people 5000 miles away to
try to find a computer somewhere to log in. At any rate, would be a much
faster way to entice editors to a site somewhere.

I have some strange ability to assimilate and understand languages very
rapidly, being raised trilingual my have had something to do
with it. I find swahili fluid, musical, and beautiful and very much like
our language (but not quite as synthetic or structured).

Jeff
_______________________________________________
foundation-l mailing list
foundation-l@wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/foundation-l