Mailing List Archive

Unexpected diffs when rebuilding
Hi, all;
   I am preparing to T&R 2.4.44 and am concerned with some of the diff
output I see after rebuilding docs. FYI: I've migrated from OpenJDK 8 to
OpenJDK 11 since my previous rebuild of the docs (which means I had to
drop the Xbootclasspath argument)

The output I am seeing does not render properly in my terminal (guess it
doesn't support ISO-8859-1), but it seems like the original file is
'correct'. However, when I rebuild the docs, these characters are HTML
encoded rather than ISO-8859-1. Is this expected? I've double checked
the README and nothing stands out. Perhaps something related to JDK11
move and ditching Xbootclasspath?


Example:

Index: manual/vhosts/name-based.html.en
===================================================================
--- manual/vhosts/name-based.html.en    (revision 1880272)
+++ manual/vhosts/name-based.html.en    (working copy)
@@ -25,10 +25,10 @@
 <div class="toplang">
 <p><span>Available Languages: </span><a
href="../de/vhosts/name-based.html" hreflang="de" rel="alternate"
title="Deutsch">&nbsp;de&nbsp;</a> |
 <a href="../en/vhosts/name-based.html" title="English">&nbsp;en&nbsp;</a> |
-<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate"
title="Fran?ais">&nbsp;fr&nbsp;</a> |
+<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate"
title="Fran&#231;ais">&nbsp;fr&nbsp;</a> |

<snip>


Apologies if this was discussed already - I only stumbled upon it as I
tried to T&R just now.

--
Daniel Ruggeri
Re: Unexpected diffs when rebuilding [ In reply to ]
On Fri, Jul 24, 2020 at 1:25 PM Daniel Ruggeri <daniel@bitnebula.com> wrote:
>
> Hi, all;
> I am preparing to T&R 2.4.44 and am concerned with some of the diff output I see after rebuilding docs. FYI: I've migrated from OpenJDK 8 to OpenJDK 11 since my previous rebuild of the docs (which means I had to drop the Xbootclasspath argument)
>
> The output I am seeing does not render properly in my terminal (guess it doesn't support ISO-8859-1), but it seems like the original file is 'correct'. However, when I rebuild the docs, these characters are HTML encoded rather than ISO-8859-1. Is this expected? I've double checked the README and nothing stands out. Perhaps something related to JDK11 move and ditching Xbootclasspath?

I think this is the issue in the long running thread on this list.
I personally think this change in anchor title in *.en files is
harmless.
I am curious if you also get manpage entries changed after the build.
Those would need scrutiny I guess, although in that case the english
is probably not an issue but only other languages where we might find
the wrong codepage or HTML entitites?

I also think we should drop xbootclasspath and bail out if java < 11
so people are less likely to waffle between the two flavors.
...And spot-check the output if someone can articulate a problem with
some encoding.




>
> Index: manual/vhosts/name-based.html.en
> ===================================================================
> --- manual/vhosts/name-based.html.en (revision 1880272)
> +++ manual/vhosts/name-based.html.en (working copy)
> @@ -25,10 +25,10 @@
> <div class="toplang">
> <p><span>Available Languages: </span><a href="../de/vhosts/name-based.html" hreflang="de" rel="alternate" title="Deutsch">&nbsp;de&nbsp;</a> |
> <a href="../en/vhosts/name-based.html" title="English">&nbsp;en&nbsp;</a> |
> -<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate" title="Fran?ais">&nbsp;fr&nbsp;</a> |
> +<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate" title="Fran&#231;ais">&nbsp;fr&nbsp;</a> |
>
> <snip>
>
>
> Apologies if this was discussed already - I only stumbled upon it as I tried to T&R just now.
>
> --
> Daniel Ruggeri



--
Eric Covener
covener@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org
Re: Unexpected diffs when rebuilding [ In reply to ]
On 7/24/2020 5:27 PM, Eric Covener wrote:
> On Fri, Jul 24, 2020 at 1:25 PM Daniel Ruggeri <daniel@bitnebula.com> wrote:
>> Hi, all;
>> I am preparing to T&R 2.4.44 and am concerned with some of the diff output I see after rebuilding docs. FYI: I've migrated from OpenJDK 8 to OpenJDK 11 since my previous rebuild of the docs (which means I had to drop the Xbootclasspath argument)
>>
>> The output I am seeing does not render properly in my terminal (guess it doesn't support ISO-8859-1), but it seems like the original file is 'correct'. However, when I rebuild the docs, these characters are HTML encoded rather than ISO-8859-1. Is this expected? I've double checked the README and nothing stands out. Perhaps something related to JDK11 move and ditching Xbootclasspath?
> I think this is the issue in the long running thread on this list.
> I personally think this change in anchor title in *.en files is
> harmless.

Many thanks, Eric

Yeah... I figured it was all related. For clarity, there are hundreds
(maybe thousands) of changes across both the anchors in .en files, but
also the contents of various other language translations. All appear to
be related to the "special" characters. I've attached the full output of
svn diff after a rebuild of the docs in the 2.4.x branch.


> I am curious if you also get manpage entries changed after the build.
> Those would need scrutiny I guess, although in that case the english
> is probably not an issue but only other languages where we might find
> the wrong codepage or HTML entitites?

This adds a bit of confusion actually because I took a look at README :-)
There haven't been changes under docs/man though the files were all
clearly rebuilt.... but yes, as you expect, all of the characters that
seem outside of the ASCII range (but are correct in ISO-8859-1) have
been HTML encoded in docs/manual.


> I also think we should drop xbootclasspath and bail out if java < 11
> so people are less likely to waffle between the two flavors.
> ...And spot-check the output if someone can articulate a problem with
> some encoding.

Aye - I was toying with the idea myself of just patching the script to
inspect the version of java first and include/omit the Xbootclasspath
based on that output. Given that I am just not terribly familiar with
the details here, I was going to bring this up for conversation after
T&R was done. But... this huge number of changes threw a wrench into the
machinery so I wanted to ask what the expected behavior is.


That said... what is the expectation? The README next to build.sh is
ambiguous about what we *want* to happen:

> ### UTF-8 vs. XML entities in foo.html.en
>
> Old JDK's happily put UTF-8 bytes into ISO8859-1 english files which
seems wrong.
> Newer JDK's (w/o -Xbootclasspath? in build.sh?) will replace them with
XML entities.
>
> Impact: XML entities break manpages (if checked in)

It seems that I should commit the changes because docs/manual has
changed and docs/man has not? I also ask because if we have half the
devs using newer JDKs that implement "expected" behavior, and half the
devs using JDKs implementing the old behavior, we'll have constant
waffling back and forth of committed files between the two formats. This
will leave SVN history full of noise (though, I guess that doesn't
matter much for the generated files?)

I'm happy to update build.sh with whatever we decide is "correct", but
for now I want to move forward with the planned T&R once I understand
what "correct" is.


>
>
>
>
>> Index: manual/vhosts/name-based.html.en
>> ===================================================================
>> --- manual/vhosts/name-based.html.en (revision 1880272)
>> +++ manual/vhosts/name-based.html.en (working copy)
>> @@ -25,10 +25,10 @@
>> <div class="toplang">
>> <p><span>Available Languages: </span><a href="../de/vhosts/name-based.html" hreflang="de" rel="alternate" title="Deutsch">&nbsp;de&nbsp;</a> |
>> <a href="../en/vhosts/name-based.html" title="English">&nbsp;en&nbsp;</a> |
>> -<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate" title="Fran?ais">&nbsp;fr&nbsp;</a> |
>> +<a href="../fr/vhosts/name-based.html" hreflang="fr" rel="alternate" title="Fran&#231;ais">&nbsp;fr&nbsp;</a> |
>>
>> <snip>
>>
>>
>> Apologies if this was discussed already - I only stumbled upon it as I tried to T&R just now.
>>
>> --
>> Daniel Ruggeri
>
>