Mailing List Archive

Dynamic Document Roots
I'm hoping to introduce dynamic document roots based on the hostname in the
incoming request. I can set any type of %ENV almost except the
document_root. I can even set the server_root. I know that vhost is out
there but it doesn't cut it for what I need to be doing.

Also, any info on how to set the user that will be handling the process will
be great since that is my next great adventure.

---------------------------------------------------------------------
Best regards,

Karyn Ulriksen
Chief Systems Architect
PublicHost
22 Mauchly, Suite 200
Irvine, California 92618 USA
Phone: (949) 743-2000
email: kulriksen@publichost.com
URL: http://www.publichost.com
Re: Dynamic Document Roots [ In reply to ]
Hi there,

On Fri, 25 Feb 2000, Karyn Ulriksen wrote:

> I'm hoping to introduce dynamic document roots based on the hostname in the
> incoming request. I can set any type of %ENV almost except the
> document_root. I can even set the server_root. I know that vhost is out
> there but it doesn't cut it for what I need to be doing.

Have a look at mod_rewrite?

> Also, any info on how to set the user that will be handling the
> process will be great since that is my next great adventure.

Unless you mean you want the entire server to have your chosen uid/gid
(in which case it's in httpd.conf) then this could be a bit trickier
with mod_perl. There was another post on the list recently which
seemed to indicate things are not what they seem. I can't find it
just at the moment (I saved it somewhere in my own mail, but obviously
not where I thought I did) so if you don't get any other pointers come
back to me and I'll have another look.

73,
Ged.

> ---------------------------------------------------------------------
> Karyn Ulriksen
> Chief Systems Architect
> PublicHost
> 22 Mauchly, Suite 200
> Irvine, California 92618 USA
> Phone: (949) 743-2000
> email: kulriksen@publichost.com
> URL: http://www.publichost.com
Re: Dynamic Document Roots [ In reply to ]
Thanks for responding.

Rewrite is a great module if you only have a few virtuals. I'm already
doing the Translation phase which gives the the level of control I need. I
actually just need the $ENV{'DOCUMENT_ROOT'} set for reading my user CGI's
such as FrontPage's fpcount.exe program which relies on that variable. I
can work around it by replacing fpcount.exe with my own program, but I know
that it won't be the only program out there with that seems to think it
needs to know the DOCUMENT_ROOT. In other words I can get away with not
having it, but it sure would make a few things simpler. It's wierd that you
can set the SERVER_ROOT which seems to be much more of security risk than
DOCUMENT_ROOT.

I'm coming the lists right now, maybe I'll come across the UID one you
mentioned. If you happen to find it, I'd appreciate forwarding it to me.

Thanx, Karyn.

-----Original Message-----
From: G.W. Haywood [mailto:ged@jubileegroup.co.uk]
Sent: Friday, February 25, 2000 9:47 AM
To: mod_perl Mailing List
Subject: Re: Dynamic Document Roots


Hi there,

On Fri, 25 Feb 2000, Karyn Ulriksen wrote:

> I'm hoping to introduce dynamic document roots based on the hostname in
the
> incoming request. I can set any type of %ENV almost except the
> document_root. I can even set the server_root. I know that vhost is out
> there but it doesn't cut it for what I need to be doing.

Have a look at mod_rewrite?

> Also, any info on how to set the user that will be handling the
> process will be great since that is my next great adventure.

Unless you mean you want the entire server to have your chosen uid/gid
(in which case it's in httpd.conf) then this could be a bit trickier
with mod_perl. There was another post on the list recently which
seemed to indicate things are not what they seem. I can't find it
just at the moment (I saved it somewhere in my own mail, but obviously
not where I thought I did) so if you don't get any other pointers come
back to me and I'll have another look.

73,
Ged.

> ---------------------------------------------------------------------
> Karyn Ulriksen
> Chief Systems Architect
> PublicHost
> 22 Mauchly, Suite 200
> Irvine, California 92618 USA
> Phone: (949) 743-2000
> email: kulriksen@publichost.com
> URL: http://www.publichost.com
RE: Dynamic Document Roots [ In reply to ]
I've been managing virtual hosting platforms for webfarms for the past 4
years including AnaServe and in turn Concentric. I've ran servers that
easily serviced 2 to 3000 domains and am very familiar with the limitations
and pitfalls of those kind of numbers. My next platform will be serving
probably 65000 unique sites across load balanced servers. Right now I'm
doing the prototype for this platform which obviously involves pulling apart
Apache a little bit. One of the issues I'm making it a point to avoid is
replicating the data across all those servers and having to reboot apache
every time a new site is added or removed. Naturally, I'm talking database
interaction with appropriate mechanisms to support the enourmous volumes of
request that it would take to support these numbers (but I'll worry about
the part - please don't go off on that tangent...)

The ".exe" file is well known in the NT world, but this one actually lives
on unix and could just as easily be called fpcount.cgi, but uncle Bill
dumped it onto the unix world as fpcount.exe. It is simply a counter script
written in C and, naturally, uncle Bill didn't make the source code
available. But it's pretty plain and has been confirmed elsewhere that it
relies on the DOCUMENT_ROOT variable to determine where it's counter file
is. Nasty, but true. I can foresee that there will be other instances of
this occuring.

To continue on about the numbers of sites I will be using, the rewrite rule
can not be smart enough to account for nuances, varying subdirectory
structures due how we plan to manage the directory structures for this
volume. So again as I said, ReWrites were great at 2000 sites, but not at
65000+ and still don't address the DOCUMENT_ROOT env var issue.

Thanx for any insight you might have on this

-----Original Message-----
From: G.W. Haywood [mailto:ged@jubileegroup.co.uk]
Sent: Friday, February 25, 2000 10:27 AM
To: Karyn Ulriksen
Subject: RE: Dynamic Document Roots


Hi there,

On Fri, 25 Feb 2000, Karyn Ulriksen wrote:

> Rewrite is a great module if you only have a few virtuals

Tell me more of your experience with larger numbers?

> I'm already doing the Translation phase which gives the the level of
> control I need. I actually just need the $ENV{'DOCUMENT_ROOT'} set
> for reading my user CGI's such as FrontPage's fpcount.exe program
> which relies on that variable. I can work around it by replacing
> fpcount.exe with my own program, but I know that it won't be the
> only program out there with that seems to think it needs to know the
> DOCUMENT_ROOT. In other words I can get away with not having it,
> but it sure would make a few things simpler. It's wierd that you
> can set the SERVER_ROOT which seems to be much more of security risk
> than DOCUMENT_ROOT.

What's all this .exe stuff?

> I'm coming the lists right now, maybe I'll come across the UID one
> you mentioned. If you happen to find it, I'd appreciate forwarding
> it to me.

Thank goodness for emacs:
======================================================================
Re: Dynamic Document Roots [ In reply to ]
Have you taken a look at mod_vhost_alias? Its been part of
Apache since 1.3.9, I think.

Jim

On Feb 25, Karyn Ulriksen wrote:
> I've been managing virtual hosting platforms for webfarms for the past 4
> years including AnaServe and in turn Concentric. I've ran servers that
> easily serviced 2 to 3000 domains and am very familiar with the limitations
> and pitfalls of those kind of numbers. My next platform will be serving
> probably 65000 unique sites across load balanced servers. Right now I'm
> doing the prototype for this platform which obviously involves pulling apart
> Apache a little bit. One of the issues I'm making it a point to avoid is
> replicating the data across all those servers and having to reboot apache
> every time a new site is added or removed. Naturally, I'm talking database
> interaction with appropriate mechanisms to support the enourmous volumes of
> request that it would take to support these numbers (but I'll worry about
> the part - please don't go off on that tangent...)
>
> The ".exe" file is well known in the NT world, but this one actually lives
> on unix and could just as easily be called fpcount.cgi, but uncle Bill
> dumped it onto the unix world as fpcount.exe. It is simply a counter script
> written in C and, naturally, uncle Bill didn't make the source code
> available. But it's pretty plain and has been confirmed elsewhere that it
> relies on the DOCUMENT_ROOT variable to determine where it's counter file
> is. Nasty, but true. I can foresee that there will be other instances of
> this occuring.
>
> To continue on about the numbers of sites I will be using, the rewrite rule
> can not be smart enough to account for nuances, varying subdirectory
> structures due how we plan to manage the directory structures for this
> volume. So again as I said, ReWrites were great at 2000 sites, but not at
> 65000+ and still don't address the DOCUMENT_ROOT env var issue.
>
> Thanx for any insight you might have on this
>
> -----Original Message-----
> From: G.W. Haywood [mailto:ged@jubileegroup.co.uk]
> Sent: Friday, February 25, 2000 10:27 AM
> To: Karyn Ulriksen
> Subject: RE: Dynamic Document Roots
>
>
> Hi there,
>
> On Fri, 25 Feb 2000, Karyn Ulriksen wrote:
>
> > Rewrite is a great module if you only have a few virtuals
>
> Tell me more of your experience with larger numbers?
>
> > I'm already doing the Translation phase which gives the the level of
> > control I need. I actually just need the $ENV{'DOCUMENT_ROOT'} set
> > for reading my user CGI's such as FrontPage's fpcount.exe program
> > which relies on that variable. I can work around it by replacing
> > fpcount.exe with my own program, but I know that it won't be the
> > only program out there with that seems to think it needs to know the
> > DOCUMENT_ROOT. In other words I can get away with not having it,
> > but it sure would make a few things simpler. It's wierd that you
> > can set the SERVER_ROOT which seems to be much more of security risk
> > than DOCUMENT_ROOT.
>
> What's all this .exe stuff?
>
> > I'm coming the lists right now, maybe I'll come across the UID one
> > you mentioned. If you happen to find it, I'd appreciate forwarding
> > it to me.
>
> Thank goodness for emacs:
> ======================================================================
> >From ged@jubileegroup.co.uk Sat Feb 19 19:00:54 2000
> Date: Sat, 19 Feb 2000 19:00:49 +0000 (GMT)
> From: "G.W. Haywood" <ged@jubileegroup.co.uk>
> To: radek.stachowiak@alter.pl
> cc: Stas Bekman <sbekman@iname.com>
> Subject: Re: suexec and mod_perl does not work together???
> In-Reply-To: <20000219183209.A4541@blue.alter.pl>
> Message-ID:
> <Pine.LNX.3.96.1000219184644.23615A-100000@C2H5OH.jubileegroup.co.uk>
> MIME-Version: 1.0
> Content-Type: TEXT/PLAIN; charset=US-ASCII
> Status: O
> X-Status:
>
> Hi there,
>
> On Sat, 19 Feb 2000, Radoslaw Stachowiak wrote:
>
> > The problem is: perl scripts run WITHOUT mod perl, switch UID and
> > GID according to User and Group directive (suexec), but WITH
> > mod_perl they run as standard server uid (nobody).
>
> Ah, I see! The light is dawning on me, sorry to be so dense.
>
> It does not surprise me that suexec does not behave as you expect
> under mod_perl. Without mod_perl, Apache has to load and run the
> interpreter for each script invocation. It is the interpreter which
> has the notion of a UID and GID, not the script which it is executing,
> and it is the interpreter (or its parent) which has to do suexec to
> switch IDs. After suexec it is quite possible that an Apache child
> might not have permission to read one of your scripts. When Apache
> runs with mod_perl, it goes to a lot of truoble to avoid reading
> time-consuming configurations and suchlike. It loads one and only one
> copy of the interpreter which compiles scripts once and leaves them in
> memory. If it were to do suexec AFTER compiling a script, who is to
> say that it would even have had permission to read or execute that
> script if it had had the new UID when it needed to read it in the
> first place?
>
> Maybe I'm missing something here.
>
> Stas, I can find nothing about this in the Guide, any comments? If
> this hasn't been discussed on the List before, I guess it will be now.
> ======================================================================
> 73,
> Ged.
>
RE: Dynamic Document Roots [ In reply to ]
Jim,

Thanx. I've pulled mod_vhost_alias completely apart and looked at it. It
has some interesting things that it's doing, but setting the document_root
is not one of them. It looks like I have to get in the middle of the
server_rec and make modification, that is add the record if it doesn't
exist, remove it if its no longer a good record. I think I need to repost
for that....

-----Original Message-----
From: Jim Winstead [mailto:jimw@trainedmonkey.com]
Sent: Friday, February 25, 2000 1:46 PM
To: Karyn Ulriksen; 'G.W. Haywood'
Cc: 'modperl@apache.org'
Subject: Re: Dynamic Document Roots


Have you taken a look at mod_vhost_alias? Its been part of
Apache since 1.3.9, I think.

Jim

On Feb 25, Karyn Ulriksen wrote:
> I've been managing virtual hosting platforms for webfarms for the past 4
> years including AnaServe and in turn Concentric. I've ran servers that
> easily serviced 2 to 3000 domains and am very familiar with the
limitations
> and pitfalls of those kind of numbers. My next platform will be serving
> probably 65000 unique sites across load balanced servers. Right now I'm
> doing the prototype for this platform which obviously involves pulling
apart
> Apache a little bit. One of the issues I'm making it a point to avoid is
> replicating the data across all those servers and having to reboot apache
> every time a new site is added or removed. Naturally, I'm talking
database
> interaction with appropriate mechanisms to support the enourmous volumes
of
> request that it would take to support these numbers (but I'll worry about
> the part - please don't go off on that tangent...)
>
> The ".exe" file is well known in the NT world, but this one actually lives
> on unix and could just as easily be called fpcount.cgi, but uncle Bill
> dumped it onto the unix world as fpcount.exe. It is simply a counter
script
> written in C and, naturally, uncle Bill didn't make the source code
> available. But it's pretty plain and has been confirmed elsewhere that it
> relies on the DOCUMENT_ROOT variable to determine where it's counter file
> is. Nasty, but true. I can foresee that there will be other instances of
> this occuring.
>
> To continue on about the numbers of sites I will be using, the rewrite
rule
> can not be smart enough to account for nuances, varying subdirectory
> structures due how we plan to manage the directory structures for this
> volume. So again as I said, ReWrites were great at 2000 sites, but not at
> 65000+ and still don't address the DOCUMENT_ROOT env var issue.
>
> Thanx for any insight you might have on this
>
> -----Original Message-----
> From: G.W. Haywood [mailto:ged@jubileegroup.co.uk]
> Sent: Friday, February 25, 2000 10:27 AM
> To: Karyn Ulriksen
> Subject: RE: Dynamic Document Roots
>
>
> Hi there,
>
> On Fri, 25 Feb 2000, Karyn Ulriksen wrote:
>
> > Rewrite is a great module if you only have a few virtuals
>
> Tell me more of your experience with larger numbers?
>
> > I'm already doing the Translation phase which gives the the level of
> > control I need. I actually just need the $ENV{'DOCUMENT_ROOT'} set
> > for reading my user CGI's such as FrontPage's fpcount.exe program
> > which relies on that variable. I can work around it by replacing
> > fpcount.exe with my own program, but I know that it won't be the
> > only program out there with that seems to think it needs to know the
> > DOCUMENT_ROOT. In other words I can get away with not having it,
> > but it sure would make a few things simpler. It's wierd that you
> > can set the SERVER_ROOT which seems to be much more of security risk
> > than DOCUMENT_ROOT.
>
> What's all this .exe stuff?
>
> > I'm coming the lists right now, maybe I'll come across the UID one
> > you mentioned. If you happen to find it, I'd appreciate forwarding
> > it to me.
>
> Thank goodness for emacs:
> ======================================================================
> >From ged@jubileegroup.co.uk Sat Feb 19 19:00:54 2000
> Date: Sat, 19 Feb 2000 19:00:49 +0000 (GMT)
> From: "G.W. Haywood" <ged@jubileegroup.co.uk>
> To: radek.stachowiak@alter.pl
> cc: Stas Bekman <sbekman@iname.com>
> Subject: Re: suexec and mod_perl does not work together???
> In-Reply-To: <20000219183209.A4541@blue.alter.pl>
> Message-ID:
> <Pine.LNX.3.96.1000219184644.23615A-100000@C2H5OH.jubileegroup.co.uk>
> MIME-Version: 1.0
> Content-Type: TEXT/PLAIN; charset=US-ASCII
> Status: O
> X-Status:
>
> Hi there,
>
> On Sat, 19 Feb 2000, Radoslaw Stachowiak wrote:
>
> > The problem is: perl scripts run WITHOUT mod perl, switch UID and
> > GID according to User and Group directive (suexec), but WITH
> > mod_perl they run as standard server uid (nobody).
>
> Ah, I see! The light is dawning on me, sorry to be so dense.
>
> It does not surprise me that suexec does not behave as you expect
> under mod_perl. Without mod_perl, Apache has to load and run the
> interpreter for each script invocation. It is the interpreter which
> has the notion of a UID and GID, not the script which it is executing,
> and it is the interpreter (or its parent) which has to do suexec to
> switch IDs. After suexec it is quite possible that an Apache child
> might not have permission to read one of your scripts. When Apache
> runs with mod_perl, it goes to a lot of truoble to avoid reading
> time-consuming configurations and suchlike. It loads one and only one
> copy of the interpreter which compiles scripts once and leaves them in
> memory. If it were to do suexec AFTER compiling a script, who is to
> say that it would even have had permission to read or execute that
> script if it had had the new UID when it needed to read it in the
> first place?
>
> Maybe I'm missing something here.
>
> Stas, I can find nothing about this in the Guide, any comments? If
> this hasn't been discussed on the List before, I guess it will be now.
> ======================================================================
> 73,
> Ged.
>
RE: Dynamic Document Roots [ In reply to ]
Hi there,

On Fri, 25 Feb 2000, Karyn Ulriksen wrote:

> I've been managing virtual hosting platforms for webfarms for the
> past 4 years including AnaServe and in turn Concentric. I've ran
> servers that easily serviced 2 to 3000 domains and am very familiar
> with the limitations and pitfalls of those kind of numbers. My next
> platform will be serving probably 65000 unique sites across load
> balanced servers.

Wow!

> Right now I'm doing the prototype for this platform which obviously
> involves pulling apart Apache a little bit. One of the issues I'm
> making it a point to avoid is replicating the data across all those
> servers and having to reboot apache every time a new site is added
> or removed.

Do you mean reboot Apache when its configuration changes? You don't
have to. Just send Apache a SIGUSR1 (use 'apachectl graceful').

It will kill off the old generation of children as they finish their
requests and start new ones with a new config. Transparent to users,
easy, your servers need never go down. It even tells you if the new
config is broken and ignores the restart so you can you fix it. And
'apachectl configtest' tells you what you broke.

> Naturally, I'm talking database interaction with appropriate
> mechanisms to support the enourmous volumes of request that it would
> take to support these numbers (but I'll worry about the part -
> please don't go off on that tangent...)

OK.

> The ".exe" file is well known in the NT world

:)

> but this one actually lives on unix and could just as easily be
> called fpcount.cgi, but uncle Bill dumped it onto the unix world as
> fpcount.exe. It is simply a counter script written in C and,
> naturally, uncle Bill didn't make the source code available. But
> it's pretty plain and has been confirmed elsewhere that it relies on
> the DOCUMENT_ROOT variable to determine where it's counter file is.
> Nasty, but true. I can foresee that there will be other instances
> of this occuring.

And you *run* it? On a site that big?

I'd march to the North Pole first.

What does it count? Anything difficult? C is my second language, and
Bill is my middle name. Really! No promises, but it'd be fun to write
mod_bills_counter.c

Jim Winstead mentioned mod_vhost_alias.c (in Apache since 1.3.7), but
from what you say I don't know if VirtualDocumentRoot will do the job.

If it's causing you trouble already then unless it's going to be a
huge job to replace it (I think a simple counter script probably
wouldn't qualify as a huge job) then I can't quite see why you want to
use it. Is there not a different way of doing this that's totally
under your control (e.g. open source, write a C module...) then maybe
you wouldn't have to worry about

> the DOCUMENT_ROOT env var issue

at all.

> To continue on about the numbers of sites I will be using, the
> rewrite rule can not be smart enough to account for nuances, varying
> subdirectory structures due how we plan to manage the directory
> structures for this volume.

Well, it could be a (mod_perl compiled) Perl script that does the
rewriting, that can do almost anything. But I guess you're closer to
the thing than I.

Hope this helps:)

73,
Ged.
Re: Dynamic Document Roots [ In reply to ]
On Feb 26, G.W. Haywood wrote:
> On Fri, 25 Feb 2000, Karyn Ulriksen wrote:
> > I've been managing virtual hosting platforms for webfarms for the
> > past 4 years including AnaServe and in turn Concentric. I've ran
> > servers that easily serviced 2 to 3000 domains and am very familiar
> > with the limitations and pitfalls of those kind of numbers. My next
> > platform will be serving probably 65000 unique sites across load
> > balanced servers.
>
> Wow!

For some people that's just a good week. :)

> > Right now I'm doing the prototype for this platform which obviously
> > involves pulling apart Apache a little bit. One of the issues I'm
> > making it a point to avoid is replicating the data across all those
> > servers and having to reboot apache every time a new site is added
> > or removed.
>
> Do you mean reboot Apache when its configuration changes? You don't
> have to. Just send Apache a SIGUSR1 (use 'apachectl graceful').

For a system where you have users constantly signing up accounts,
and you're running a cluster of load-balanced servers (say, something
like http://www.homepage.com/ :), even sending them a SIGUSR1 on
every signup doesn't scale very well. That's the sort of thing
mod_vhost_alias addresses.

> > but this one actually lives on unix and could just as easily be
> > called fpcount.cgi, but uncle Bill dumped it onto the unix world as
> > fpcount.exe. It is simply a counter script written in C and,
> > naturally, uncle Bill didn't make the source code available. But
> > it's pretty plain and has been confirmed elsewhere that it relies on
> > the DOCUMENT_ROOT variable to determine where it's counter file is.
> > Nasty, but true. I can foresee that there will be other instances
> > of this occuring.
>
> And you *run* it? On a site that big?
>
> I'd march to the North Pole first.

For the type of service Karyn is probably talking about, FrontPage
support is probably a basic requirement. In that case, the FrontPage
counter thing is only one of your problems (and the easiest, because
if I remember correctly it *only* does counters and doesn't really
interact with the FrontPage client itself). The other FrontPage
support programs actually speak an arcane RPC-ish protocol to the
FrontPage client, so replacing those completely would require a
fairly significant amount of work.

> > To continue on about the numbers of sites I will be using, the
> > rewrite rule can not be smart enough to account for nuances, varying
> > subdirectory structures due how we plan to manage the directory
> > structures for this volume.
>
> Well, it could be a (mod_perl compiled) Perl script that does the
> rewriting, that can do almost anything. But I guess you're closer to
> the thing than I.

Yes, you can probably set all of the environment variables that
are going to be relevant to a CGI like the FrontPage extensions
within a PerlFixupHandler (or at whatever other phase of the request
you want before the variables need to be set up).

As to managing the directory structure for a large number of
accounts, there was a discussion on the Apache new-httpd list not
so long ago about adding a hashing scheme to mod_vhost_alias that
addresses that issue rather nicely. I'd check the archives at
http://dev.apache.org/mail/new-httpd/.

Jim