Mailing List Archive

A new approach to doc translation ?
Hi,

What I consider the biggest drawback of our current doc translation
process is that you have to keep it updated all the time in order to be
able to follow the updates from the English version.

For a new comer, or someone who has just a few hours a week or month for
it, I think that it is quite hard.

Not that docs updates happen so often, but when it gets out of synch,
getting it back to a good shape looks hard to me.
You have to diff the English version so see what has changed. Then to
find the impact in the translated files, then update it, then propose it
via ML or BZ, then wait for someone to take it and apply it.

The few that have seen in the past years look rapidly discouraged and
stop updating the doc rapidly.
One special mention to Lucien for the GREAT work he does for the French
translation.


I've been looking for a tool that could do some xml --> po files
updates. The files to translate would then be only some small pieces of
text that could be handled by poedit or equivalent software.

The main advantages I see are:
- ease to spot changes
- same sentences in different files (or even branch) are translated
only once
- ease to merge work of different contributors
- some translation web sites have a translation process that ease
access to contributor, with the possibility for the translation
community to validate others translation (Some years ago, I've been
using https://translatewiki.net for that)

The drawbacks are the one of po files:
- the context is missing when translating
- this requires some additional scripting to generate and update the
po files, and to convert them back to XML for our XSL based toolchain



Using something like po files for the translation would also lead to
only partly localized files. Little by little, the not-updated part of
the doc would get replaced by the more up-to-date English version. I
don't think it is an issue. I prefer a mixed language document than
having something that I can not trust because I don't know what is
up-to-date or not.

itstool [1] is the most promising tool I found so far.
The main advantages it has is that it can easily be configured to tell
what must not be translated. It also have a kind of placeholder
mechanism. This fits perfectly well with our current XML based master
documents.

I'm close to have a working PoC but I wanted to have your feedback on
this approach to doc translation.

Attached is an example of all the mod/*/xml files processed and the
rules file I've written so far.



Do you think that such an approach is viable ?

CJ


[1]: http://itstool.org/index.html
Re: A new approach to doc translation ? [ In reply to ]
Den 6/16/2020 9:52 PM, skrev Christophe JAILLET:
> Hi,
>
> What I consider the biggest drawback of our current doc translation
> process is that you have to keep it updated all the time in order to
> be able to follow the updates from the English version.
>
> For a new comer, or someone who has just a few hours a week or month
> for it, I think that it is quite hard.
>
> Not that docs updates happen so often, but when it gets out of synch,
> getting it back to a good shape looks hard to me.
> You have to diff the English version so see what has changed. Then to
> find the impact in the translated files, then update it, then propose
> it via ML or BZ, then wait for someone to take it and apply it.
>
> The few that have seen in the past years look rapidly discouraged and
> stop updating the doc rapidly.
> One special mention to Lucien for the GREAT work he does for the
> French translation.
>
>
> I've been looking for a tool that could do some xml --> po files
> updates. The files to translate would then be only some small pieces
> of text that could be handled by poedit or equivalent software.
>
> The main advantages I see are:
>    - ease to spot changes
>    - same sentences in different files (or even branch) are translated
> only once
>    - ease to merge work of different contributors
>    - some translation web sites have a translation process that ease
> access to contributor, with the possibility for the translation
> community to validate others translation (Some years ago, I've been
> using https://translatewiki.net for that)
>
> The drawbacks are the one of po files:
>    - the context is missing when translating
>    - this requires some additional scripting to generate and update
> the po files, and to convert them back to XML for our XSL based toolchain
>
>
>
> Using something like po files for the translation would also lead to
> only partly localized files. Little by little, the not-updated part of
> the doc would get replaced by the more up-to-date English version. I
> don't think it is an issue. I prefer a mixed language document than
> having something that I can not trust because I don't know what is
> up-to-date or not.
>
> itstool [1] is the most promising tool I found so far.
> The main advantages it has is that it can easily be configured to tell
> what must not be translated. It also have a kind of placeholder
> mechanism. This fits perfectly well with our current XML based master
> documents.
>
> I'm close to have a working PoC but I wanted to have your feedback on
> this approach to doc translation.
>
> Attached is an example of all the mod/*/xml files processed and the
> rules file I've written so far.
>
>
>
> Do you think that such an approach is viable ?


Hi,

I'm just a lurker who once did some Norwegian translation, but I am from
time to time involved in translations in other projects.

The process you describe is consistent with what we do in other
projects, and is in my opinion the prefered method. The drawback of
missing context can to a large degree be ameliorated by build automation.

What I do in some projects I am responsible for is that I set a limit,
at least X % of the project must be translated in order for it to be
published. In my personal opinion, at about 95% a translation becomes
useful, anything less leaves the whole thing as a mess. It's better to
concede defeat and either publish outdated docs, clearly marked or
redirect to an actually completed translation in another language. Eg.
English as a default.

I'm a big believer in using Weblate as it enables the whole translation
to be somewhat democratized. Anyone can suggest a new translation if
enabled, and someone authorized can choose to accept or reject it. This
is separated from the actual repository access.

So in short, I think this is the way forward.


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: A new approach to doc translation ? [ In reply to ]
Le 17/06/2020 à 13:45, Tom Fredrik Blenning a écrit :
> Den 6/16/2020 9:52 PM, skrev Christophe JAILLET:
>> Hi,
>>
>> What I consider the biggest drawback of our current doc translation
>> process is that you have to keep it updated all the time in order to
>> be able to follow the updates from the English version.
>>
>> For a new comer, or someone who has just a few hours a week or month
>> for it, I think that it is quite hard.
>>
>> Not that docs updates happen so often, but when it gets out of synch,
>> getting it back to a good shape looks hard to me.
>> You have to diff the English version so see what has changed. Then to
>> find the impact in the translated files, then update it, then propose
>> it via ML or BZ, then wait for someone to take it and apply it.
>>
>> The few that have seen in the past years look rapidly discouraged and
>> stop updating the doc rapidly.
>> One special mention to Lucien for the GREAT work he does for the
>> French translation.
>>
>>
>> I've been looking for a tool that could do some xml --> po files
>> updates. The files to translate would then be only some small pieces
>> of text that could be handled by poedit or equivalent software.
>>
>> The main advantages I see are:
>>    - ease to spot changes
>>    - same sentences in different files (or even branch) are
>> translated only once
>>    - ease to merge work of different contributors
>>    - some translation web sites have a translation process that ease
>> access to contributor, with the possibility for the translation
>> community to validate others translation (Some years ago, I've been
>> using https://translatewiki.net for that)
>>
>> The drawbacks are the one of po files:
>>    - the context is missing when translating
>>    - this requires some additional scripting to generate and update
>> the po files, and to convert them back to XML for our XSL based
>> toolchain
>>
>>
>>
>> Using something like po files for the translation would also lead to
>> only partly localized files. Little by little, the not-updated part
>> of the doc would get replaced by the more up-to-date English version.
>> I don't think it is an issue. I prefer a mixed language document than
>> having something that I can not trust because I don't know what is
>> up-to-date or not.
>>
>> itstool [1] is the most promising tool I found so far.
>> The main advantages it has is that it can easily be configured to
>> tell what must not be translated. It also have a kind of placeholder
>> mechanism. This fits perfectly well with our current XML based master
>> documents.
>>
>> I'm close to have a working PoC but I wanted to have your feedback on
>> this approach to doc translation.
>>
>> Attached is an example of all the mod/*/xml files processed and the
>> rules file I've written so far.
>>
>>
>>
>> Do you think that such an approach is viable ?
>
>
> Hi,
>
> I'm just a lurker who once did some Norwegian translation, but I am
> from time to time involved in translations in other projects.
>
> The process you describe is consistent with what we do in other
> projects, and is in my opinion the prefered method. The drawback of
> missing context can to a large degree be ameliorated by build automation.
>
> What I do in some projects I am responsible for is that I set a limit,
> at least X % of the project must be translated in order for it to be
> published. In my personal opinion, at about 95% a translation becomes
> useful, anything less leaves the whole thing as a mess. It's better to
> concede defeat and either publish outdated docs, clearly marked or
> redirect to an actually completed translation in another language. Eg.
> English as a default.
>
> I'm a big believer in using Weblate as it enables the whole
> translation to be somewhat democratized. Anyone can suggest a new
> translation if enabled, and someone authorized can choose to accept or
> reject it. This is separated from the actual repository access.
>
> So in short, I think this is the way forward.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
> For additional commands, e-mail: docs-help@httpd.apache.org
>
Hello everybody,

About newcomers, it seems that the main problem is to find reviewers.
(Aleksey, are you still here ?)

About translations updates :

I have downloaded the two svn repos, say in /2.4-repos and /trunk-repos

All english XML files are saved  in a backup directory on my computer.

Every time I want to update my xml files, I do "svn update" in
/2.4-repos and /trunk-repos, then I filter the output to only see XML files.

Yet, I only have to do a diff between original XML file in the backup
directory and the corresponding one that was modified in the svn repos.

I think it's not so hard to do.

Lucien



---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: A new approach to doc translation ? [ In reply to ]
On 18/06/2020 17:37, Lucien Gentis wrote:
>
> Le 17/06/2020 à 13:45, Tom Fredrik Blenning a écrit :
>> Den 6/16/2020 9:52 PM, skrev Christophe JAILLET:
>>> Hi,
>>>
>>> What I consider the biggest drawback of our current doc translation
>>> process is that you have to keep it updated all the time in order to
>>> be able to follow the updates from the English version.
>>>
>>> For a new comer, or someone who has just a few hours a week or month
>>> for it, I think that it is quite hard.
>>>
>>> Not that docs updates happen so often, but when it gets out of synch,
>>> getting it back to a good shape looks hard to me.
>>> You have to diff the English version so see what has changed. Then to
>>> find the impact in the translated files, then update it, then propose
>>> it via ML or BZ, then wait for someone to take it and apply it.
>>>
>>> The few that have seen in the past years look rapidly discouraged and
>>> stop updating the doc rapidly.
>>> One special mention to Lucien for the GREAT work he does for the
>>> French translation.
>>>
>>>
>>> I've been looking for a tool that could do some xml --> po files
>>> updates. The files to translate would then be only some small pieces
>>> of text that could be handled by poedit or equivalent software.
>>>
>>> The main advantages I see are:
>>>    - ease to spot changes
>>>    - same sentences in different files (or even branch) are
>>> translated only once
>>>    - ease to merge work of different contributors
>>>    - some translation web sites have a translation process that ease
>>> access to contributor, with the possibility for the translation
>>> community to validate others translation (Some years ago, I've been
>>> using https://translatewiki.net for that)
>>>
>>> The drawbacks are the one of po files:
>>>    - the context is missing when translating
>>>    - this requires some additional scripting to generate and update
>>> the po files, and to convert them back to XML for our XSL based
>>> toolchain
>>>
>>>
>>>
>>> Using something like po files for the translation would also lead to
>>> only partly localized files. Little by little, the not-updated part
>>> of the doc would get replaced by the more up-to-date English version.
>>> I don't think it is an issue. I prefer a mixed language document than
>>> having something that I can not trust because I don't know what is
>>> up-to-date or not.
>>>
>>> itstool [1] is the most promising tool I found so far.
>>> The main advantages it has is that it can easily be configured to
>>> tell what must not be translated. It also have a kind of placeholder
>>> mechanism. This fits perfectly well with our current XML based master
>>> documents.
>>>
>>> I'm close to have a working PoC but I wanted to have your feedback on
>>> this approach to doc translation.
>>>
>>> Attached is an example of all the mod/*/xml files processed and the
>>> rules file I've written so far.
>>>
>>>
>>>
>>> Do you think that such an approach is viable ?
>>
>>
>> Hi,
>>
>> I'm just a lurker who once did some Norwegian translation, but I am
>> from time to time involved in translations in other projects.
>>
>> The process you describe is consistent with what we do in other
>> projects, and is in my opinion the prefered method. The drawback of
>> missing context can to a large degree be ameliorated by build automation.
>>
>> What I do in some projects I am responsible for is that I set a limit,
>> at least X % of the project must be translated in order for it to be
>> published. In my personal opinion, at about 95% a translation becomes
>> useful, anything less leaves the whole thing as a mess. It's better to
>> concede defeat and either publish outdated docs, clearly marked or
>> redirect to an actually completed translation in another language. Eg.
>> English as a default.
>>
>> I'm a big believer in using Weblate as it enables the whole
>> translation to be somewhat democratized. Anyone can suggest a new
>> translation if enabled, and someone authorized can choose to accept or
>> reject it. This is separated from the actual repository access.
>>
>> So in short, I think this is the way forward.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
>> For additional commands, e-mail: docs-help@httpd.apache.org
>>
> Hello everybody,
>
> About newcomers, it seems that the main problem is to find reviewers.
> (Aleksey, are you still here ?)
>
> About translations updates :
>
> I have downloaded the two svn repos, say in /2.4-repos and /trunk-repos
>
> All english XML files are saved  in a backup directory on my computer.
>
> Every time I want to update my xml files, I do "svn update" in
> /2.4-repos and /trunk-repos, then I filter the output to only see XML
> files.
>
> Yet, I only have to do a diff between original XML file in the backup
> directory and the corresponding one that was modified in the svn repos.
>
> I think it's not so hard to do.

I digress.

For you and me that might not be a hurdle, but I dear you to introduce
that process to the 10 next people you meet outside of a developer
environment, chances are they will not understand what you talk about.

We have pensioners who are, with all due respect, computer illiterates
doing translation for us. Apache is a very specialized project, so I
don't think there will be an avalanche of pensioners volunteering to do
translations on this, but I do think there's a lot of more casual users
who would be able to do this if it was more accessible. Even if they are
capable of understanding svn and diff, there's a hurdle to participation.

Participation is always the key, but participation often requires ease
of access.

-Tom Fredrik

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: A new approach to doc translation ? [ In reply to ]
Hi Christophe,

I've been researching reST/Sphinx lately (email coming Sometime Soon --
hopefully later today) and wanted to chime in with some observations,
based on the translation workflow for that.

On 6/16/20 12:52 PM, Christophe JAILLET wrote:
> I've been looking for a tool that could do some xml --> po files
> updates. The files to translate would then be only some small pieces of
> text that could be handled by poedit or equivalent software.

This is also Sphinx's approach to translation, based on gettext.

> The main advantages I see are:
>    - ease to spot changes
>    - same sentences in different files (or even branch) are translated
> only once

FWIW, Sphinx splits translatable chunks by paragraph when constructing
its .pot templates, and it puts them into separate files based on the
source location. So though you might have to duplicate some work, you
also get a little more context.

(Though from looking at your .pot template, it looks like your tool also
set up some, but not all, of the translations this way.)

>    - ease to merge work of different contributors
>    - some translation web sites have a translation process that ease
> access to contributor, with the possibility for the translation
> community to validate others translation (Some years ago, I've been
> using https://translatewiki.net for that)

As another example, the sphinx-intl tool integrates directly with the
Transifex service, which appears to be used by the Sphinx project
itself. It looks like it may have a "free" tier for OSS projects. I know
nothing more about it.
> Using something like po files for the translation would also lead to
> only partly localized files. Little by little, the not-updated part of
> the doc would get replaced by the more up-to-date English version. I
> don't think it is an issue. I prefer a mixed language document than
> having something that I can not trust because I don't know what is
> up-to-date or not.

This was also one of the big questions I had about the Sphinx approach.
My own opinion isn't particularly useful here, since I consume the docs
in English.

> Do you think that such an approach is viable ?

Given that other large projects seem to use a similar approach, it seems
like it should be viable from a technical perspective. I can't speak to
the usability of the .po translation tools themselves.

--Jacob

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: A new approach to doc translation ? [ In reply to ]
Le 20/06/2020 à 11:11, Tom Fredrik Blenning a écrit :
> On 18/06/2020 17:37, Lucien Gentis wrote:
>> Le 17/06/2020 à 13:45, Tom Fredrik Blenning a écrit :
>>> Den 6/16/2020 9:52 PM, skrev Christophe JAILLET:
>>>> Hi,
>>>>
>>>> What I consider the biggest drawback of our current doc translation
>>>> process is that you have to keep it updated all the time in order to
>>>> be able to follow the updates from the English version.
>>>>
>>>> For a new comer, or someone who has just a few hours a week or month
>>>> for it, I think that it is quite hard.
>>>>
>>>> Not that docs updates happen so often, but when it gets out of synch,
>>>> getting it back to a good shape looks hard to me.
>>>> You have to diff the English version so see what has changed. Then to
>>>> find the impact in the translated files, then update it, then propose
>>>> it via ML or BZ, then wait for someone to take it and apply it.
>>>>
>>>> The few that have seen in the past years look rapidly discouraged and
>>>> stop updating the doc rapidly.
>>>> One special mention to Lucien for the GREAT work he does for the
>>>> French translation.
>>>>
>>>>
>>>> I've been looking for a tool that could do some xml --> po files
>>>> updates. The files to translate would then be only some small pieces
>>>> of text that could be handled by poedit or equivalent software.
>>>>
>>>> The main advantages I see are:
>>>>    - ease to spot changes
>>>>    - same sentences in different files (or even branch) are
>>>> translated only once
>>>>    - ease to merge work of different contributors
>>>>    - some translation web sites have a translation process that ease
>>>> access to contributor, with the possibility for the translation
>>>> community to validate others translation (Some years ago, I've been
>>>> using https://translatewiki.net for that)
>>>>
>>>> The drawbacks are the one of po files:
>>>>    - the context is missing when translating
>>>>    - this requires some additional scripting to generate and update
>>>> the po files, and to convert them back to XML for our XSL based
>>>> toolchain
>>>>
>>>>
>>>>
>>>> Using something like po files for the translation would also lead to
>>>> only partly localized files. Little by little, the not-updated part
>>>> of the doc would get replaced by the more up-to-date English version.
>>>> I don't think it is an issue. I prefer a mixed language document than
>>>> having something that I can not trust because I don't know what is
>>>> up-to-date or not.
>>>>
>>>> itstool [1] is the most promising tool I found so far.
>>>> The main advantages it has is that it can easily be configured to
>>>> tell what must not be translated. It also have a kind of placeholder
>>>> mechanism. This fits perfectly well with our current XML based master
>>>> documents.
>>>>
>>>> I'm close to have a working PoC but I wanted to have your feedback on
>>>> this approach to doc translation.
>>>>
>>>> Attached is an example of all the mod/*/xml files processed and the
>>>> rules file I've written so far.
>>>>
>>>>
>>>>
>>>> Do you think that such an approach is viable ?
>>>
>>> Hi,
>>>
>>> I'm just a lurker who once did some Norwegian translation, but I am
>>> from time to time involved in translations in other projects.
>>>
>>> The process you describe is consistent with what we do in other
>>> projects, and is in my opinion the prefered method. The drawback of
>>> missing context can to a large degree be ameliorated by build automation.
>>>
>>> What I do in some projects I am responsible for is that I set a limit,
>>> at least X % of the project must be translated in order for it to be
>>> published. In my personal opinion, at about 95% a translation becomes
>>> useful, anything less leaves the whole thing as a mess. It's better to
>>> concede defeat and either publish outdated docs, clearly marked or
>>> redirect to an actually completed translation in another language. Eg.
>>> English as a default.
>>>
>>> I'm a big believer in using Weblate as it enables the whole
>>> translation to be somewhat democratized. Anyone can suggest a new
>>> translation if enabled, and someone authorized can choose to accept or
>>> reject it. This is separated from the actual repository access.
>>>
>>> So in short, I think this is the way forward.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
>>> For additional commands, e-mail: docs-help@httpd.apache.org
>>>
>> Hello everybody,
>>
>> About newcomers, it seems that the main problem is to find reviewers.
>> (Aleksey, are you still here ?)
>>
>> About translations updates :
>>
>> I have downloaded the two svn repos, say in /2.4-repos and /trunk-repos
>>
>> All english XML files are saved  in a backup directory on my computer.
>>
>> Every time I want to update my xml files, I do "svn update" in
>> /2.4-repos and /trunk-repos, then I filter the output to only see XML
>> files.
>>
>> Yet, I only have to do a diff between original XML file in the backup
>> directory and the corresponding one that was modified in the svn repos.
>>
>> I think it's not so hard to do.
> I digress.
>
> For you and me that might not be a hurdle, but I dear you to introduce
> that process to the 10 next people you meet outside of a developer
> environment, chances are they will not understand what you talk about.
>
> We have pensioners who are, with all due respect, computer illiterates
> doing translation for us. Apache is a very specialized project, so I
> don't think there will be an avalanche of pensioners volunteering to do
> translations on this, but I do think there's a lot of more casual users
> who would be able to do this if it was more accessible. Even if they are
> capable of understanding svn and diff, there's a hurdle to participation.
>
> Participation is always the key, but participation often requires ease
> of access.
>
> -Tom Fredrik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
> For additional commands, e-mail: docs-help@httpd.apache.org
>
I only wanted to show to Apache doc team that it was not so difficult to
handle doc translation.

Of course, if a newcomer asked me to give some tips, I should give
him/her more detailed explanations. (with
http://httpd.apache.org/docs-project/translations.html as support)

All in all, there are only a few svn commands to know, the diff command,
save/open/edit files in the file system.

Finally, if another translation environment is to be installed, this is
not a problem for me, I'll adapt myself.


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org