Mailing List Archive

generating static pages that are not linked to
Hello,

I have a public web site for which I generate all content as static
pages with "forrest site", then install with "rsync". This is working
well and after going through the initial learning steps I have been very
happy with forrest.

There's just one thing that I have not been able to figure out: I want
to write some pages as xdoc .xml, then have "forrest site" build
the .html and .pdf for it, but I do not want to link to these files from
any other page. The purpose is to have one public part which is visible
to users and search engines and a more private part that I can point
people to in emails by giving them the "secret" URL. The problem is that
because "forrest site" only builds pages that are referenced and thus
does not build the "hidden" pages.

In the mailing list archives I found the hint to add a href without
description, but the URL would still be in the source of the generated
HTML, it's just not visible. It doesn't validate either with forrest
0.8.

Any other suggestions?

--
Bye, Patrick Ohly
--
Patrick.Ohly@gmx.de
http://www.estamos.de/
Re: generating static pages that are not linked to [ In reply to ]
Patrick Ohly wrote:
>
> I have a public web site for which I generate all content as static
> pages with "forrest site", then install with "rsync". This is working
> well and after going through the initial learning steps I have been very
> happy with forrest.

That is great to hear.

A "deploy.rsync" workstage for the Forrestbot would be
fanstastic. If you want to join us on the dev mail list.
Nudge, nudge.
http://forrest.apache.org/tools/forrestbot.html

> There's just one thing that I have not been able to figure out: I want
> to write some pages as xdoc .xml, then have "forrest site" build
> the .html and .pdf for it, but I do not want to link to these files from
> any other page. The purpose is to have one public part which is visible
> to users and search engines and a more private part that I can point
> people to in emails by giving them the "secret" URL. The problem is that
> because "forrest site" only builds pages that are referenced and thus
> does not build the "hidden" pages.
>
> In the mailing list archives I found the hint to add a href without
> description, but the URL would still be in the source of the generated
> HTML, it's just not visible. It doesn't validate either with forrest
> 0.8.
>
> Any other suggestions?

I might be seeing the hint that you refer to at
http://issues.apache.org/jira/browse/FOR-480
Are you sure that that causes validation errors
in document-v20 xdoc? Use "link" rather than "a"
if using document-v1* xdoc.

Anyway i note your issue about not wanting these
links to be revealed.

See http://forrest.apache.org/faq.html#cli-xconf
It hints at the solution, but only shows how to
exclude files.

This directs Apache Cocoon to add extra URIs for
processing. See
http://forrest.apache.org/docs/dev/howto/howto-asf-mirror.html#link

However see the link to issue FOR-480 below that.
I see that we have not updated the howto doc
to show the proper workaround.

Look at our Forrest website as an example.
You have the source at $FORREST_HOME/site-author/
or you can see the current files via browse SVN.

Look in site-author/conf/cli.xconf search for "mirrors".
Our download page is generated but not linked.
The generated file is read and displayed by the
server's mirror CGI script.

I will try to update the docs mentioned above
and the FOR-480 issue.

-David
Re: generating static pages that are not linked to [ In reply to ]
There is one other easy way to do this:

- Use the scales skin (which is really just a high performance
version of pelt)
- Reference your hidden pages through an normal entry in site.xml
- Add the attribute

type="showWhenSelected"

to that reference. It should now look like

<page2 label="Visible only when current" href="page2.html"
type="showWhenSelected"/>

Example taken from the seed site.

Your page will now be references and rendered but the refence will
not normally show in the menu. It is not just hidden in the menu but
truely missing from it.

However it is part if the rendered website and - when opened by the
secret url - will actually show with context (with an entry on the menu)
in your site.

hth
Ferdinand Soethe
Re: generating static pages that are not linked to [ In reply to ]
On 14/06/07, Ferdinand Soethe <ferdinand@apache.org> wrote:
> There is one other easy way to do this:
>
> - Use the scales skin (which is really just a high performance
> version of pelt)
> - Reference your hidden pages through an normal entry in site.xml
> - Add the attribute
>
> type="showWhenSelected"
>
> to that reference. It should now look like
>
> <page2 label="Visible only when current" href="page2.html"
> type="showWhenSelected"/>

This does not achieve the effect Patrick is after. As he points out in
the original mail, having a reference in site.xml results in the link
appearing in the page (although it is not visible). That will result
in content being indexed by search engines, which is precisely what
Patrick wants to avoid.

Ross
Re: generating static pages that are not linked to [ In reply to ]
Ross Gardler wrote:

> This does not achieve the effect Patrick is after. As he points out in
> the original mail, having a reference in site.xml results in the link
> appearing in the page (although it is not visible). That will result
> in content being indexed by search engines, which is precisely what
> Patrick wants to avoid.

I disagree.
I wrote the mechanism in scales so it will not simply hide the menu
entry in the page but not put it in there in the first place unless the
page is the current page.

Check out
http://www.bildungsverein.de/dozentinnen/auto_dozentinnen_uebersicht.html
then the navigate to the first entry
http://www.bildungsverein.de/dozentinnen/auto_Nagy-Abd-El-Malek.html

The first page has NO menu link to the second page at all. The link in
the first page used to navigate to the second is optional.

So unless you have an additional sitemap that lists all pages I don't
see any references.

Best regards,
Ferdinand Soethe
Re: generating static pages that are not linked to [ In reply to ]
On 14/06/07, Ferdinand Soethe <ferdinand@apache.org> wrote:
> Ross Gardler wrote:
>
> > This does not achieve the effect Patrick is after. As he points out in
> > the original mail, having a reference in site.xml results in the link
> > appearing in the page (although it is not visible). That will result
> > in content being indexed by search engines, which is precisely what
> > Patrick wants to avoid.
>
> I disagree.
> I wrote the mechanism in scales so it will not simply hide the menu
> entry in the page but not put it in there in the first place unless the
> page is the current page.
>
> Check out
> http://www.bildungsverein.de/dozentinnen/auto_dozentinnen_uebersicht.html
> then the navigate to the first entry
> http://www.bildungsverein.de/dozentinnen/auto_Nagy-Abd-El-Malek.html

Retrieve http://www.bildungsverein.de/linkmap.html and you will see
another link to the "hidden" page. This page appears in all Forrest
generated sites, it is what Forrest uses to as a list of pages that
need to be generated according to site.xml.

David's approach prevents the page being listed in linkmap.html.

There is another (theoretical) approach that does not require editing
of the cli config. I'll not detail it here, but if Patrick can't use
the cli config method for any reason I'd be happy to expand.

Ross
Re: generating static pages that are not linked to [ In reply to ]
On Wed, 2007-06-13 at 23:05 +0200, Patrick Ohly wrote:
> Hello,
>
> I have a public web site for which I generate all content as static
> pages with "forrest site", then install with "rsync". This is working
> well and after going through the initial learning steps I have been very
> happy with forrest.
>
> There's just one thing that I have not been able to figure out: I want
> to write some pages as xdoc .xml, then have "forrest site" build
> the .html and .pdf for it, but I do not want to link to these files from
> any other page. The purpose is to have one public part which is visible
> to users and search engines and a more private part that I can point
> people to in emails by giving them the "secret" URL. The problem is that
> because "forrest site" only builds pages that are referenced and thus
> does not build the "hidden" pages.
>
> In the mailing list archives I found the hint to add a href without
> description, but the URL would still be in the source of the generated
> HTML, it's just not visible. It doesn't validate either with forrest
> 0.8.

You can make different forrest site runs:
<target name="internal-export">
<!-- generate private -->
<antcall target="site">
<param name="project.home" location="${project.home}"/>
<param name="project.start-uri"
location="/private/index.html"/>
<param name="project.build-dir"
location="${build.dir}/PATH"></param>
</antcall>

Requesting project.start-uri will not use the locationmap.html but the
url you provide.

The above is as ant target but you can call it like:
forrest site -Dproject.start-uri=/private/index.html

HTH

salu2
--
Thorsten Scherler thorsten.at.apache.org
Open Source Java consulting, training and solutions
Re: generating static pages that are not linked to [ In reply to ]
Ross Gardler wrote:

> Retrieve http://www.bildungsverein.de/linkmap.html and you will see
> another link to the "hidden" page. This page appears in all Forrest
> generated sites, it is what Forrest uses to as a list of pages that
> need to be generated according to site.xml.

That's what I meant by having an additional sitemap (in the web master
sense of the word).

But this page is not normally linked to from anywhere, is it? So unless
you explicitly build a link to it, a visitor or search engine would
never "find" it, unless they understood Forrest.

Btw: Is this page really needed in the _final site_? In other words
could we not eleminate or at least generate it conditionally?

Or, as a very simple solution, not sync it?

Thorsten wrote:

> The above is as ant target but you can call it like:
> forrest site -Dproject.start-uri=/private/index.html

That sounds nice and easy. But I assume that it would also stop all
hiden pages from being generated once again, right?
RE: generating static pages that are not linked to [ In reply to ]
> -----Original Message-----
> From: Ferdinand Soethe [mailto:ferdinand@apache.org]
> Sent: Thursday, 14 June 2007 9:03 PM
> To: user@forrest.apache.org
> Subject: Re: generating static pages that are not linked to
>
>
> Ross Gardler wrote:
>
> > Retrieve http://www.bildungsverein.de/linkmap.html and you will see
> > another link to the "hidden" page. This page appears in all Forrest
> > generated sites, it is what Forrest uses to as a list of pages that
> > need to be generated according to site.xml.
>
> That's what I meant by having an additional sitemap (in the web master
> sense of the word).
>
> But this page is not normally linked to from anywhere, is it? So unless
> you explicitly build a link to it, a visitor or search engine would
> never "find" it, unless they understood Forrest.
>
> Btw: Is this page really needed in the _final site_? In other words
> could we not eleminate or at least generate it conditionally?
>
> Or, as a very simple solution, not sync it?
>
> Thorsten wrote:
>
> > The above is as ant target but you can call it like:
> > forrest site -Dproject.start-uri=/private/index.html
>
> That sounds nice and easy. But I assume that it would also stop all
> hiden pages from being generated once again, right?

I guess are many solutions may or may not fit the job, webmasters should
therefore probably use a robots.txt to ensure exclusion -- most of the
major search engines will adhere to the rules contained within.

Gav...
Re: generating static pages that are not linked to [ In reply to ]
On 14/06/07, Ferdinand Soethe <ferdinand@apache.org> wrote:
>
> Ross Gardler wrote:
>
> > Retrieve http://www.bildungsverein.de/linkmap.html and you will see
> > another link to the "hidden" page. This page appears in all Forrest
> > generated sites, it is what Forrest uses to as a list of pages that
> > need to be generated according to site.xml.
>
> That's what I meant by having an additional sitemap (in the web master
> sense of the word).
>
> But this page is not normally linked to from anywhere, is it? So unless
> you explicitly build a link to it, a visitor or search engine would
> never "find" it, unless they understood Forrest.

True.

> Btw: Is this page really needed in the _final site_? In other words
> could we not eleminate or at least generate it conditionally?

It cannot be generated conditionally, it is a fundamental part of
static site generation.

It could be deleted after the build, but how do we know it is not
being linked to (it is very useful and many Forrest sites, like our
own, use it).

> Or, as a very simple solution, not sync it?

I'm not sure what your measure of simple is. Using the CLI to achieve
what is needed is the simplest solution and requires no customised
build/deploy targets.

That being said, it is an option, just not one I would use.

Ross
Re: generating static pages that are not linked to [ In reply to ]
Ross Gardler wrote:

>> Or, as a very simple solution, not sync it?
>
> I'm not sure what your measure of simple is. Using the CLI to achieve
> what is needed is the simplest solution and requires no customised
> build/deploy targets.
>
> That being said, it is an option, just not one I would use.

I guess all the solutions have drawbacks.

As does above mentioned CLI-solution. Check out
http://forrest.apache.org/mirrors.cgi and you will see that it actually
breaks the Forrest menu system, because the pages added this way will
not have a "place" in the menu-system so that context (where am I) is lost.

I agree with Gavin: It really depends on what you want and need.

The nice part of this discussion is that I finally understand why that
happens with mirrors.cgi :-)

Best regards,
Ferdinand Soethe
Re: generating static pages that are not linked to [ In reply to ]
Hello,

so many responsens - thanks a lot. I should have asked much earlier ;-)

On Do, 2007-06-14 at 10:25 +1000, David Crossley wrote:
> Patrick Ohly wrote:
> >
> > I have a public web site for which I generate all content as static
> > pages with "forrest site", then install with "rsync". This is working
> > well and after going through the initial learning steps I have been very
> > happy with forrest.
>
> That is great to hear.
>
> A "deploy.rsync" workstage for the Forrestbot would be
> fanstastic. If you want to join us on the dev mail list.
> Nudge, nudge.
> http://forrest.apache.org/tools/forrestbot.html

Sorry, I'm nudge-resistant. I like to contribute to open source projects
that I use, but in this case I don't know much about the underlying
technology. Perhaps if I get bored (not likely) I'll try to change that.

> > In the mailing list archives I found the hint to add a href without
> > description, but the URL would still be in the source of the generated
> > HTML, it's just not visible. It doesn't validate either with forrest
> > 0.8.
> >
> > Any other suggestions?
>
> I might be seeing the hint that you refer to at
> http://issues.apache.org/jira/browse/FOR-480
> Are you sure that that causes validation errors
> in document-v20 xdoc?

No, it works as intended - I wasn't paying enough attention to the
error, it referred to something else.

> See http://forrest.apache.org/faq.html#cli-xconf
> It hints at the solution, but only shows how to
> exclude files.
>
> This directs Apache Cocoon to add extra URIs for
> processing. See
> http://forrest.apache.org/docs/dev/howto/howto-asf-mirror.html#link
>
> However see the link to issue FOR-480 below that.
> I see that we have not updated the howto doc
> to show the proper workaround.

I have done that and the additional URIs are mentioned in the list of
files that forrest works on. What I have problems with is figuring out
to what the path in the "dest" attribute (FOR-300) is relative to:
absolute paths worked, relative ones don't (no error message, no file).

> Look in site-author/conf/cli.xconf search for "mirrors".

It uses dest="../../site-author/build/site". That might be relative to
site-author/conf/, but going up the corresponding number of levels
relative to my cli.xconf didn't work. Any suggestions?

--
Bye, Patrick Ohly
--
Patrick.Ohly@gmx.de
http://www.estamos.de/
Re: generating static pages that are not linked to [ In reply to ]
Patrick Ohly wrote:
>
> so many responsens - thanks a lot. I should have asked much earlier ;-)

Yes, and as you can see, other people learn something from
the discussion too.

> David Crossley wrote:
> > Patrick Ohly wrote:
> > >
> > > In the mailing list archives I found the hint to add a href without
> > > description, but the URL would still be in the source of the generated
> > > HTML, it's just not visible. It doesn't validate either with forrest
> > > 0.8.
> > >
> > > Any other suggestions?
> >
> > I might be seeing the hint that you refer to at
> > http://issues.apache.org/jira/browse/FOR-480
> > Are you sure that that causes validation errors
> > in document-v20 xdoc?
>
> No, it works as intended - I wasn't paying enough attention to the
> error, it referred to something else.
>
> > See http://forrest.apache.org/faq.html#cli-xconf
> > It hints at the solution, but only shows how to
> > exclude files.
> >
> > This directs Apache Cocoon to add extra URIs for
> > processing. See
> > http://forrest.apache.org/docs/dev/howto/howto-asf-mirror.html#link
> >
> > However see the link to issue FOR-480 below that.
> > I see that we have not updated the howto doc
> > to show the proper workaround.
>
> I have done that and the additional URIs are mentioned in the list of
> files that forrest works on. What I have problems with is figuring out
> to what the path in the "dest" attribute (FOR-300) is relative to:
> absolute paths worked, relative ones don't (no error message, no file).
>
> > Look in site-author/conf/cli.xconf search for "mirrors".
>
> It uses dest="../../site-author/build/site". That might be relative to
> site-author/conf/, but going up the corresponding number of levels
> relative to my cli.xconf didn't work. Any suggestions?

It is probably relative to $FORREST_HOME/main/webapp/

Try omitting the "dest" attribute, then search your
filesystem to find where the file was generated to,
then add the "dest" attribute again.

Oh, i wonder if you completely followed the FAQ
http://forrest.apache.org/faq.html#cli-xconf
to declare your cli.xconf properly in forrest.properties

-David
Re: generating static pages that are not linked to [ In reply to ]
Ross Gardler wrote:
> Ferdinand Soethe wrote:
> >Ross Gardler wrote:
> >
> >> Retrieve http://www.bildungsverein.de/linkmap.html and you will see
> >> another link to the "hidden" page. This page appears in all Forrest
> >> generated sites, it is what Forrest uses to as a list of pages that
> >> need to be generated according to site.xml.
> >
> >That's what I meant by having an additional sitemap (in the web master
> >sense of the word).
> >
> >But this page is not normally linked to from anywhere, is it? So unless
> >you explicitly build a link to it, a visitor or search engine would
> >never "find" it, unless they understood Forrest.
>
> True.
>
> >Btw: Is this page really needed in the _final site_? In other words
> >could we not eleminate or at least generate it conditionally?
>
> It cannot be generated conditionally, it is a fundamental part of
> static site generation.

The html version is not required. The internal xml
version probably is required.

As said, don't link to the html version.

Set the project.start-uri in forrest.properties to be
something other than linkmap.html e.g. index.html
(or even your own separate list of links). Cocoon
will crawl from there. However, any pages that
are not linked from index.html or its linked pages
will not be crawled. That is why it is better to
start from linkmap.html

-David

> It could be deleted after the build, but how do we know it is not
> being linked to (it is very useful and many Forrest sites, like our
> own, use it).
>
> >Or, as a very simple solution, not sync it?
>
> I'm not sure what your measure of simple is. Using the CLI to achieve
> what is needed is the simplest solution and requires no customised
> build/deploy targets.
>
> That being said, it is an option, just not one I would use.
>
> Ross
Re: generating static pages that are not linked to [ In reply to ]
David Crossley wrote:
> Patrick Ohly wrote:
> >
> > so many responsens - thanks a lot. I should have asked much earlier ;-)
>
> Yes, and as you can see, other people learn something from
> the discussion too.
>
> > David Crossley wrote:
> > > Patrick Ohly wrote:
> > > >
> > > > In the mailing list archives I found the hint to add a href without
> > > > description, but the URL would still be in the source of the generated
> > > > HTML, it's just not visible. It doesn't validate either with forrest
> > > > 0.8.
> > > >
> > > > Any other suggestions?
> > >
> > > I might be seeing the hint that you refer to at
> > > http://issues.apache.org/jira/browse/FOR-480
> > > Are you sure that that causes validation errors
> > > in document-v20 xdoc?
> >
> > No, it works as intended - I wasn't paying enough attention to the
> > error, it referred to something else.
> >
> > > See http://forrest.apache.org/faq.html#cli-xconf
> > > It hints at the solution, but only shows how to
> > > exclude files.
> > >
> > > This directs Apache Cocoon to add extra URIs for
> > > processing. See
> > > http://forrest.apache.org/docs/dev/howto/howto-asf-mirror.html#link
> > >
> > > However see the link to issue FOR-480 below that.
> > > I see that we have not updated the howto doc
> > > to show the proper workaround.
> >
> > I have done that and the additional URIs are mentioned in the list of
> > files that forrest works on. What I have problems with is figuring out
> > to what the path in the "dest" attribute (FOR-300) is relative to:
> > absolute paths worked, relative ones don't (no error message, no file).
> >
> > > Look in site-author/conf/cli.xconf search for "mirrors".
> >
> > It uses dest="../../site-author/build/site". That might be relative to
> > site-author/conf/, but going up the corresponding number of levels
> > relative to my cli.xconf didn't work. Any suggestions?
>
> It is probably relative to $FORREST_HOME/main/webapp/
>
> Try omitting the "dest" attribute, then search your
> filesystem to find where the file was generated to,
> then add the "dest" attribute again.
>
> Oh, i wonder if you completely followed the FAQ
> http://forrest.apache.org/faq.html#cli-xconf
> to declare your cli.xconf properly in forrest.properties

Erk. Of course you did, as you said above that it worked
with a full pathname for the "dest" attribute.

-David
Re: generating static pages that are not linked to [ In reply to ]
On Fr, 2007-06-15 at 10:42 +1000, David Crossley wrote:
> Patrick Ohly wrote:
> > David Crossley wrote:
> > It uses dest="../../site-author/build/site". That might be relative to
> > site-author/conf/, but going up the corresponding number of levels
> > relative to my cli.xconf didn't work. Any suggestions?
>
> It is probably relative to $FORREST_HOME/main/webapp/

That's indeed the case. I had searched for the files, but only in my
working directory and $FORREST_HOME, thus I missed the files which where
somewhere above $FORREST_HOME. Having a link relative to
$FORREST_HOME/main/webapp/ has the same problem as an absolute path: if
I move around stuff, I need to remember to update that path because my
$FORREST_HOME is unrelated to my project directory.

While looking at this again I also noticed that there was a whole
"build" directory under my $FORREST_HOME/main/webapp directory. That
worked for me, but what if that directory had been read-only? I think it
would be better if the default configuration only wrote in the project
directory.

Could the "build.context" that is mentioned in the top comment of
cli.xconf be used to achieve that? As soon as this build context
directory is inside my project directory I could also use a relative
"dest" parameter and not run into problems when moving things around. I
tried a few ways to set "build.context" (forrest.properties, -D, adding
a local.build.properties file) but without success (which just shows my
embarrassing lack of understanding how the different parts play
together... ;-)

In the meantime I will use a "dest=/tmp/" parameter and then copy the
files placed there into the normal build directory before invoking
rsync. Thanks also to Thorsten and Ferdinand for their alternative
suggestions, even though I decided to go with the modified cli.xconf - I
think it fits my needs better.

--
Bye, Patrick Ohly
--
Patrick.Ohly@gmx.de
http://www.estamos.de/