Mailing List Archive

[jira] [Commented] (FOR-1231) Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML
[ https://issues.apache.org/jira/browse/FOR-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188130#comment-13188130 ]

Hitoshi Ozawa commented on FOR-1231:
------------------------------------

While at this, would appreciate if it's possible to install Japanese fonts as well so pdf containing Japanese would show up correctly as well.

> Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: FOR-1231
> URL: https://issues.apache.org/jira/browse/FOR-1231
> Project: Forrest
> Issue Type: Bug
> Components: Internationalisation (i18n)
> Affects Versions: 0.9, 0.10-dev
> Reporter: Karl Wright
> Priority: Critical
>
> We're using Forrest to generate the Apache ManifoldCF site. We've added Japanese content. The content worked fine via localhost:8888, but the generated html content does not load properly in a browser, even though the browser correctly divines that the HTML page has utf-8 encoding. It looks like many utf-8 characters in the source XML are handled correctly but some are corrupted. I've also tried the fix in FORREST-668 but this does not help. See http://incubator.apache.org/connectors and click on the tab in Japanese to see what I mean. The current source for the site can be found in: https://svn.apache.org/repos/asf/incubator/lcf/trunk/site.
> I checked out latest Forrest trunk and built and used that but there has been no improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FOR-1231) Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML [ In reply to ]
[ https://issues.apache.org/jira/browse/FOR-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188150#comment-13188150 ]

David Crossley commented on FOR-1231:
-------------------------------------

Please ask about separate usage issues on the user mailing list.

The PDF fonts are configurable. See that plugin's docs:
http://forrest.apache.org/docs/plugins/org.apache.forrest.plugin.output.pdf/

> Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: FOR-1231
> URL: https://issues.apache.org/jira/browse/FOR-1231
> Project: Forrest
> Issue Type: Bug
> Components: Internationalisation (i18n)
> Affects Versions: 0.9, 0.10-dev
> Reporter: Karl Wright
> Priority: Critical
>
> We're using Forrest to generate the Apache ManifoldCF site. We've added Japanese content. The content worked fine via localhost:8888, but the generated html content does not load properly in a browser, even though the browser correctly divines that the HTML page has utf-8 encoding. It looks like many utf-8 characters in the source XML are handled correctly but some are corrupted. I've also tried the fix in FORREST-668 but this does not help. See http://incubator.apache.org/connectors and click on the tab in Japanese to see what I mean. The current source for the site can be found in: https://svn.apache.org/repos/asf/incubator/lcf/trunk/site.
> I checked out latest Forrest trunk and built and used that but there has been no improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FOR-1231) Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML [ In reply to ]
[ https://issues.apache.org/jira/browse/FOR-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188216#comment-13188216 ]

Karl Wright commented on FOR-1231:
----------------------------------

I'm told that the Japanese portion of the site is correctly generated on a system that has a default locale of ja_JP. Obviously, though, this is not a good solution to the problem since we cannot select different locales when there is more than one language involved.


> Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: FOR-1231
> URL: https://issues.apache.org/jira/browse/FOR-1231
> Project: Forrest
> Issue Type: Bug
> Components: Internationalisation (i18n)
> Affects Versions: 0.9, 0.10-dev
> Reporter: Karl Wright
> Priority: Critical
>
> We're using Forrest to generate the Apache ManifoldCF site. We've added Japanese content. The content worked fine via localhost:8888, but the generated html content does not load properly in a browser, even though the browser correctly divines that the HTML page has utf-8 encoding. It looks like many utf-8 characters in the source XML are handled correctly but some are corrupted. I've also tried the fix in FORREST-668 but this does not help. See http://incubator.apache.org/connectors and click on the tab in Japanese to see what I mean. The current source for the site can be found in: https://svn.apache.org/repos/asf/incubator/lcf/trunk/site.
> I checked out latest Forrest trunk and built and used that but there has been no improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FOR-1231) Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML [ In reply to ]
[ https://issues.apache.org/jira/browse/FOR-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188245#comment-13188245 ]

Hitoshi Ozawa commented on FOR-1231:
------------------------------------

Sorry David, I thought the html pages were being dynamically generated on the Apache server.
It seems it's not. "forrest site" works fine on my Japanese OS.

Karl, is your system setup to use en_US-UTF-8?
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

> Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: FOR-1231
> URL: https://issues.apache.org/jira/browse/FOR-1231
> Project: Forrest
> Issue Type: Bug
> Components: Internationalisation (i18n)
> Affects Versions: 0.9, 0.10-dev
> Reporter: Karl Wright
> Priority: Critical
>
> We're using Forrest to generate the Apache ManifoldCF site. We've added Japanese content. The content worked fine via localhost:8888, but the generated html content does not load properly in a browser, even though the browser correctly divines that the HTML page has utf-8 encoding. It looks like many utf-8 characters in the source XML are handled correctly but some are corrupted. I've also tried the fix in FORREST-668 but this does not help. See http://incubator.apache.org/connectors and click on the tab in Japanese to see what I mean. The current source for the site can be found in: https://svn.apache.org/repos/asf/incubator/lcf/trunk/site.
> I checked out latest Forrest trunk and built and used that but there has been no improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FOR-1231) Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML [ In reply to ]
[ https://issues.apache.org/jira/browse/FOR-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188259#comment-13188259 ]

Karl Wright commented on FOR-1231:
----------------------------------

bq. Karl, is your system setup to use en_US-UTF-8?
bq. export LC_ALL=en_US.UTF-8
bq. export LANG=en_US.UTF-8
bq. export LANGUAGE=en_US.UTF-8

I set the equivalent Windows variables but no change in the generated code for me. So it must be something else.


> Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: FOR-1231
> URL: https://issues.apache.org/jira/browse/FOR-1231
> Project: Forrest
> Issue Type: Bug
> Components: Internationalisation (i18n)
> Affects Versions: 0.9, 0.10-dev
> Reporter: Karl Wright
> Priority: Critical
>
> We're using Forrest to generate the Apache ManifoldCF site. We've added Japanese content. The content worked fine via localhost:8888, but the generated html content does not load properly in a browser, even though the browser correctly divines that the HTML page has utf-8 encoding. It looks like many utf-8 characters in the source XML are handled correctly but some are corrupted. I've also tried the fix in FORREST-668 but this does not help. See http://incubator.apache.org/connectors and click on the tab in Japanese to see what I mean. The current source for the site can be found in: https://svn.apache.org/repos/asf/incubator/lcf/trunk/site.
> I checked out latest Forrest trunk and built and used that but there has been no improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FOR-1231) Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML [ In reply to ]
[ https://issues.apache.org/jira/browse/FOR-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188380#comment-13188380 ]

Karl Wright commented on FOR-1231:
----------------------------------

I figured it out. What we need to do is set the JAVA default encoding to UTF-8. The easy way to do this is (on Windows):

set JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8

... or on Linux:

export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8

Doing this before a Forrest invocation causes all JVMs it brings up to have the right encoding. (It's Cocoon that seems to be broken, by the way)

> Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: FOR-1231
> URL: https://issues.apache.org/jira/browse/FOR-1231
> Project: Forrest
> Issue Type: Bug
> Components: Internationalisation (i18n)
> Affects Versions: 0.9, 0.10-dev
> Reporter: Karl Wright
> Priority: Critical
>
> We're using Forrest to generate the Apache ManifoldCF site. We've added Japanese content. The content worked fine via localhost:8888, but the generated html content does not load properly in a browser, even though the browser correctly divines that the HTML page has utf-8 encoding. It looks like many utf-8 characters in the source XML are handled correctly but some are corrupted. I've also tried the fix in FORREST-668 but this does not help. See http://incubator.apache.org/connectors and click on the tab in Japanese to see what I mean. The current source for the site can be found in: https://svn.apache.org/repos/asf/incubator/lcf/trunk/site.
> I checked out latest Forrest trunk and built and used that but there has been no improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (FOR-1231) Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML [ In reply to ]
[ https://issues.apache.org/jira/browse/FOR-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190838#comment-13190838 ]

David Crossley commented on FOR-1231:
-------------------------------------

Thanks. I was thinking of a similar patch. However i wondered if it would need to append this setting to any existing JAVA_TOOL_OPTIONS then reset at finish.

I have applied your patch as-is. Thanks.
If someone thinks that it needs more, then please do.

Regarding the Cocoon situation, i think that the doc comments refer to the fact that Cocoon/Forrest have many supporting products handling various parts of the system. Perhaps some of those treat the encoding differently. So this environment setting seems a good solution.

> Forrest does not deal properly with UTF-8 .xml content, even with the proper XML content-type header, and generates corrupted HTML
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: FOR-1231
> URL: https://issues.apache.org/jira/browse/FOR-1231
> Project: Forrest
> Issue Type: Bug
> Components: Internationalisation (i18n)
> Affects Versions: 0.9, 0.10-dev
> Reporter: Karl Wright
> Priority: Critical
> Attachments: FOR-1231.patch
>
>
> We're using Forrest to generate the Apache ManifoldCF site. We've added Japanese content. The content worked fine via localhost:8888, but the generated html content does not load properly in a browser, even though the browser correctly divines that the HTML page has utf-8 encoding. It looks like many utf-8 characters in the source XML are handled correctly but some are corrupted. I've also tried the fix in FORREST-668 but this does not help. See http://incubator.apache.org/connectors and click on the tab in Japanese to see what I mean. The current source for the site can be found in: https://svn.apache.org/repos/asf/incubator/lcf/trunk/site.
> I checked out latest Forrest trunk and built and used that but there has been no improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira