Mailing List Archive

Apache::Session-- how does it protect session data?
I am trying to understand the locking model that is used to protect an
Apache::Session from being trampled by multiple processes. The
documentation talks about implementing it in a way that people would
expect.

Unfortunately, when I think about it, there are quite a few ways I could
expect locking to work to make sure the session data is saved.

It looks to me like the default is

1) Read lock upon loading an existing session
2) Exclusive lock upon creating a session from scratch
3) Exclusive lock upon writing data (not upon storage of data in the
session hash!)
4) Release locks only upon destruction of the session object

It's very possible I could be wrong, but this model strikes me as
promoting deadlock and not preventing session corruption.

Let's consider a couple possible scenarios:

1. Let's suppose that two processes (1 and 2) both want to modify a
variable in the session. For arguement's sake, let's say that one
process is modifying an "age" key while the other one is modifying a
"firstname" key.

Let's also assume the session was previously created.

Thus, process 1 loads a session. Obtains a read (non exclusive) lock.

Process 1 modifies the age value. This is done to a memory-resident
cache so no exclusive lock is obtained yet as no write is performed to
the data store.

Process 2 loads the same session. Perhaps the result of a submission
from a different frame of a web app or in a different browser window.
Obtains a read (non exclusive) lock.

Process 2 modifies the "firstname" key/value. Again, this is done to
memory-resident cache (at least this is how I read Apache::Session).

Now, let's assume process 1 (running at about the same time)... gains
control of CPU and the program completes. At this point the session
object goes out of scope.

When Process 1's session object goes out of scope, it attempts to write
the session data that it could not write previously. Before doing this,
however, it must obtain an exclusive lock.

But it can't get an exclusive lock as Process 2 still has a read only
lock on the same session.

So eventually process 1 blocks until process 2 gains CPU time again.

Process 2 then ends up exiting and the destruction of Process 2's CGI
object demands that, it too, get an exclusive lock on the session to
write the firstname data out to the persistent data store.

Unfortunately process 2 must wait for process 1 to release the read lock
it has.

Deadlock. Process 1 wants process 2 to release its read only lock and
process 2 wants process 1 to release its read only lock.

The other alternative I see is that the locks would be freed after a
timeout period.. but if this is the case, then one of the process'es
acts of writing the session data to the data store will overwrite the
other's. The session file will not be corrupted because the entire write
operation will be surrounded by an exclusive lock, but the data will be
"logically" corrupted because the application author will have an
applicatin state he did not expect.

In conclusion, the locking workflow in Apache::Session confuses me. I
suspect people haven't run into this problem before because most people
do not share sessions among many different apps and the likelihood that
two scripts will be writing the same session 's data to disk at the same
time is extremely low.

However, I imagine the locking was put in place to prevent data
corruption in these extreme cases... so if this is the case, I am
wondering if it is really does work in these cases?

I have tried going through the Apache::Session logic myself to figure
this out, and I might be missing some piece of the puzzle... Hopefully
Jeffrey or someone else can shed some light on this for me?

At the very minimum, I would expect step 3 from above (the act of
storing any one data value in the session) to end up causing the session
to obtain an exclusive lock. I believe this would prevent the deadlock
scenario I outlined above. The problem with this mechanism is that
concurrency goes way done for an application that might require reading
the session data into multiple processes at hte same time. For example,
if an app uses frames, session data that might be read in other frames
at the same time will definately force those framed scripts to wait.

This will cause the frames to look like they are loading in order
instead of all at once... a bit of an ugly sight.

Please no comments on the merits (or lack thereof) of using frames.
:)... I am merely focusing on why the locking logic exists in
Apache::Session the way it does, and in what cases I am not sure whether
the locking will actually work as intended.

Thanks,
Gunther