Mailing List Archive: ZODB data loss

ZODB data loss - help required

Dec 4, 2013, 9:42 PM

Post #1 of 4 (2130 views)

Ive just had a very disturbing and odd experience with data loss in the
ZODB. Ive been running Zope servers for 15 years and never seen this before
until two previous incidents a few months ago. The sequence of events for
those incidents were vague so I put it down to user error, but now that its
happened a third time on a different site Im getting worried.

Plone 4.2.0.1 (4206)
CMF 2.2.6
Zope 2.13.15
Python 2.7.3 (default, Feb 6 2013, 01:00:51) [GCC 4.4.6 20120305 (Red Hat
4.4.6-4)]
PIL 2.0.0 (Pillow)

The site was been working perfectly for 4 months with many updates and
perfectly fine performance. Im now faced with the horrible job of
explaining this to the client and I dont quite understand it myself.

*Sequence of events*
The Plone site was launched to production in September and has been running
ever since. It has been restarted several times since then with minor
product updates.

Today I needed to do a yum update and then I restarted Zope using an init
script that stops Zope & ZEO and restarts them.

When Zope came up the Plone site was now showing data from August 5.
Somehow everything between August 5 and December 5 has been lost.

A scan of the transactions via the Undo Log or using data.fs tools shows
transactions up to August 5 then nothing until December 5

From the logs I can see that the server itself was rebooted on August 5 and
there was also a copy of the Data.fs made on that date too.

We have not packed the database since that date as it was not large or
growing fast.

All backups also show the August 5 data - which means the file system
copies of the Data.fs were simply copying the old version of the database.

Ive cloned the file system and kept the original untouched although in the
first hour or so after the restart I did several file copies and moves as I
was unaware of the gravity of the situation.

Ive tried extundelete and debugfs and neither of them detect any deleted
Data.fs file that can be recovered.

*Questions*
My main aim is to work out what happened. My best guess is that Zope was
somehow connecting to a stale or outdated file pointer and updating that
file all along while the Data.fs was pointing to an August 5 copy. But how
could this situation eventuate and persist for so long?

The odd thing is that we had 2 very similar incidents on 2 different Zope
servers a few months ago but both resulted in almost no data loss as the
timeframes were shorter and I dismissed them as some odd user error.

We have recently moved most of our Zope servers to Linode - could it be
their file system? or could it be the new way we setup the buildouts and
init scripts?

Any clues at all would be welcomed.

--
*Tom Cameron*
Technical Director

*Mooball IT*

Re: ZODB data loss - help required [ In reply to ]

nd at syndicat

Dec 4, 2013, 11:48 PM

Post #2 of 4 (2074 views)

Permalink

Tom Cameron <tom@mooball.net> schrieb:
>Ive just had a very disturbing and odd experience with data loss in the

Hi Tom,

do you have any Zope / event.log Logs from the timeframe?

Just an idea: Did you make the Zope restart BEFORE your system update/-grade (i assume that you mean the Cent OS packet manager "yum")?

Did you make shure that all of your Zopes python libs (not managed by your yum system - if you have some) got rebuild after the system upgrade - at least the ones with system library / binary context?

5th September sounds a relative "large" timeframe today, but is possibly not if there was not a lot of changes / operations to ZODB or the volume was small.

best regards,

Niels.
--
Niels Dettenbach
Syndicat IT&Internet
http://www.syndicat.com
_______________________________________________
Zope maillist - Zope@zope.org
https://mail.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists -
https://mail.zope.org/mailman/listinfo/zope-announce
https://mail.zope.org/mailman/listinfo/zope-dev )

Re: ZODB data loss - help required [ In reply to ]

raggam-nl at adm

Dec 5, 2013, 12:43 AM

Post #3 of 4 (2069 views)

Permalink

On Don, 2013-12-05 at 15:42 +1000, Tom Cameron wrote:

> Questions
> My main aim is to work out what happened. My best guess is that Zope
> was somehow connecting to a stale or outdated file pointer and
> updating that file all along while the Data.fs was pointing to an
> August 5 copy. But how could this situation eventuate and persist for
> so long?

for my, it looks exactly like that Zope used an inode, which wasn't the
Data.fs you expected. It used that as long as you restarted Zope. your
backups were always copying the old Data.fs from the filesystem path,
where you expected the correct one to be in.

that happens, when the directory with your Data.fs is moved
(e.g. $ mv ./var ./var-bak)
and a new one copied back to the old place
(e.g. $ cp -R ./var-bak ./var).

that happens also, when the Data.fs is deleted.

there are chances, that the correct Data.fs is still somewhere around
(except when it was deleted, then a data recovery service might be able
to help).

just a wild guess: did your backup script screw this up?

best, johannes

>
>
> The odd thing is that we had 2 very similar incidents on 2 different
> Zope servers a few months ago but both resulted in almost no data loss
> as the timeframes were shorter and I dismissed them as some odd user
> error.
>
>
> We have recently moved most of our Zope servers to Linode - could it
> be their file system? or could it be the new way we setup the
> buildouts and init scripts?
>
>
> Any clues at all would be welcomed.
>
>
>

>
>
> --
> Tom Cameron
> Technical Director
>
> Mooball IT
>
> _______________________________________________
> Zope maillist - Zope@zope.org
> https://mail.zope.org/mailman/listinfo/zope
> ** No cross posts or HTML encoding! **
> (Related lists -
> https://mail.zope.org/mailman/listinfo/zope-announce
> https://mail.zope.org/mailman/listinfo/zope-dev )

_______________________________________________
Zope maillist - Zope@zope.org
https://mail.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists -
https://mail.zope.org/mailman/listinfo/zope-announce
https://mail.zope.org/mailman/listinfo/zope-dev )

Re: ZODB data loss - help required [ In reply to ]

encolpe.degoute at free

Dec 5, 2013, 2:33 AM

Post #4 of 4 (2064 views)

Permalink

Le 05/12/2013 09:43, Johannes Raggam a écrit :
> On Don, 2013-12-05 at 15:42 +1000, Tom Cameron wrote:
>
>
>> Questions
>> My main aim is to work out what happened. My best guess is that Zope
>> was somehow connecting to a stale or outdated file pointer and
>> updating that file all along while the Data.fs was pointing to an
>> August 5 copy. But how could this situation eventuate and persist for
>> so long?
> for my, it looks exactly like that Zope used an inode, which wasn't the
> Data.fs you expected. It used that as long as you restarted Zope. your
> backups were always copying the old Data.fs from the filesystem path,
> where you expected the correct one to be in.
>
> that happens, when the directory with your Data.fs is moved
> (e.g. $ mv ./var ./var-bak)
> and a new one copied back to the old place
> (e.g. $ cp -R ./var-bak ./var).
>
> that happens also, when the Data.fs is deleted.

I'm agree with this description. You can test this python behavior
in a python shell: when a file is unlinked in the filesystem (move or
deleted) python move the file descriptor into RAM and still use it
there without warning. The file descriptor is destroyed when the
python process end (Zope stop or restart).
Your only way is to find a Data.fs.old from the last pack somewhere
or to have a backup.

--
Encolpe DEGOUTE
http://encolpe.degoute.free.fr/
Logiciels libres, hockey sur glace et autres activités cérébrales

_______________________________________________
Zope maillist - Zope@zope.org
https://mail.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists -
https://mail.zope.org/mailman/listinfo/zope-announce
https://mail.zope.org/mailman/listinfo/zope-dev )