Mailing List Archive

Suse & NetApp break oracle: flock problem
Hi all!

After upgrading the Linux kernel on our SuSE Linux Enterprise Server (SLES)
to k_smp-2.4.18-224 we were unable to start our Oracle Database which was
located on a NetApp Filer.

All the feedback we got was "no locks available" Further investigation
showed that k_smp-2.4.18-224 introduced a polyserve-flock patch which
probably causes the problems. Has anyone ran into this?

We'll do tests with k_smp-2.4.18-224 and/or 2.4.18-237 whithout the
polyserve-flock patch and share this with those who express their interest.

I address this report to both SuSE and NetApp for it's this combination that
breaks Oracle. Because it's both in SuSE's and NetApp's interest I hope this
report is appreciated though it may not have been sent through the proper
channels.

Alex Harkema
Vertis
RE: Suse & NetApp break oracle: flock problem [ In reply to ]
Mike,

thank you for your reply.
I really suspect this is a linux-(kernel)problem. However I was curious if
anyone had encountered the same issue. Besides that, we had not rebooted the
machine when the error occurred. At the moment installations are running on
the linux box (now still running the 2.4.18.-134 kernel, this one *not*
containing the polyserver-flock patch) and I don't have the opportunity to
reboot that one.

However, I'll gladly send you the dumps afterward if we're still
experiencing same problems.

Can you tell me were I can find certifications for the
oracle-on-linux/netapp products?
Like, is 9iRac certified on Netapp?

tia,
Alex

-----Original Message-----
From: Kiernan, Michael [mailto:mkiernan@netapp.com]
Sent: Monday, December 09, 2002 2:09 PM
To: 'Alex Harkema'
Cc: Rolf Fokkens
Subject: RE: Suse & NetApp break oracle: flock problem


Hi Alex,

What did 'priv set advanced; lock_dump -h' show on the filer at the time you
get the error ?

Was Oracle shutdown cleanly prior to reboot ? One problem we're aware of
with Linux rpc.statd
is that it sometimes can fail to remove the locks on the filer on reboot,
due to it locking with
an unqualified nodename, and sending lock recovery packets with a
fully-qualified domain name.
If Oracle shutdown cleanly, however, the locks should have been released by
that action.

A pktt trace from the filer to the linux box (pktt start all -i <ip of
oracle linux box>) while you
reboot the upgraded box and restart oracle would be useful, as would the
lock_dump output.

I'll be happy to look at the data.

Mike


-----Original Message-----
From: Alex Harkema [mailto:HarkemaA@vertis.nl]
Sent: Monday, December 09, 2002 1:24 PM
To: 'toasters@mathworks.com'
Cc: Rolf Fokkens
Subject: Suse & NetApp break oracle: flock problem



Hi all!

After upgrading the Linux kernel on our SuSE Linux Enterprise Server (SLES)
to k_smp-2.4.18-224 we were unable to start our Oracle Database which was
located on a NetApp Filer.

All the feedback we got was "no locks available" Further investigation
showed that k_smp-2.4.18-224 introduced a polyserve-flock patch which
probably causes the problems. Has anyone ran into this?

We'll do tests with k_smp-2.4.18-224 and/or 2.4.18-237 whithout the
polyserve-flock patch and share this with those who express their interest.

I address this report to both SuSE and NetApp for it's this combination that
breaks Oracle. Because it's both in SuSE's and NetApp's interest I hope this
report is appreciated though it may not have been sent through the proper
channels.

Alex Harkema
Vertis
RE: Suse & NetApp break oracle: flock problem [ In reply to ]
Michael,

attached you'll find the polyserv patch.
Besides that you'll find a little program 'flock.c' which is able to
execute old BSD-style flock or, at request, the POSIX fcntl lock.

First method is not usable for NFS running the 'new' kernel (2.4.18-224),
this in contrast to the 2nd method. Oracle uses first method (BSD-style)

-alex

-----Original Message-----
From: Kiernan, Michael [mailto:mkiernan@netapp.com]
Sent: Tuesday, December 10, 2002 9:13 AM
To: 'Alex Harkema'
Subject: RE: Suse & NetApp break oracle: flock problem


Hi Alex,

can you post details on the polyserve-flock patch? (a couple of minutes on
Google failed to find it...)

Thanks,
Mike

-----Original Message-----
From: Alex Harkema [mailto:HarkemaA@vertis.nl]
Sent: Monday, December 09, 2002 8:04 PM
To: Kiernan, Michael
Subject: RE: Suse & NetApp break oracle: flock problem


Indeed it was what I was looking for. Thanks!
Meanwhile, my cry for help activated some suse-people; I'm now awaiting
their responses on the locking problem.

Alex

-----Original Message-----
From: Kiernan, Michael [mailto:mkiernan@netapp.com]
Sent: Monday, December 09, 2002 5:44 PM
To: 'Alex Harkema'
Subject: RE: Suse & NetApp break oracle: flock problem


Hi Alex,

look under the partners section on the website:
http://www.netapp.com/partners/oracle/9irac.html
<http://www.netapp.com/partners/oracle/9irac.html>

whitepaper:
http://www.netapp.com/tech_library/3164.html
<http://www.netapp.com/tech_library/3164.html>

hope that's what you're looking for...

Mike

-----Original Message-----
From: Alex Harkema [mailto:HarkemaA@vertis.nl]
Sent: Monday, December 09, 2002 5:33 PM
To: Kiernan, Michael
Cc: 'toasters@mathworks.com'
Subject: RE: Suse & NetApp break oracle: flock problem


Mike,

thank you for your reply.
I really suspect this is a linux-(kernel)problem. However I was curious if
anyone had encountered the same issue. Besides that, we had not rebooted the
machine when the error occurred. At the moment installations are running on
the linux box (now still running the 2.4.18.-134 kernel, this one *not*
containing the polyserver-flock patch) and I don't have the opportunity to
reboot that one.

However, I'll gladly send you the dumps afterward if we're still
experiencing same problems.

Can you tell me were I can find certifications for the
oracle-on-linux/netapp products?
Like, is 9iRac certified on Netapp?

tia,
Alex

-----Original Message-----
From: Kiernan, Michael [mailto:mkiernan@netapp.com]
Sent: Monday, December 09, 2002 2:09 PM
To: 'Alex Harkema'
Cc: Rolf Fokkens
Subject: RE: Suse & NetApp break oracle: flock problem


Hi Alex,

What did 'priv set advanced; lock_dump -h' show on the filer at the time you
get the error ?

Was Oracle shutdown cleanly prior to reboot ? One problem we're aware of
with Linux rpc.statd
is that it sometimes can fail to remove the locks on the filer on reboot,
due to it locking with
an unqualified nodename, and sending lock recovery packets with a
fully-qualified domain name.
If Oracle shutdown cleanly, however, the locks should have been released by
that action.

A pktt trace from the filer to the linux box (pktt start all -i <ip of
oracle linux box>) while you
reboot the upgraded box and restart oracle would be useful, as would the
lock_dump output.

I'll be happy to look at the data.

Mike


-----Original Message-----
From: Alex Harkema [mailto:HarkemaA@vertis.nl]
Sent: Monday, December 09, 2002 1:24 PM
To: 'toasters@mathworks.com'
Cc: Rolf Fokkens
Subject: Suse & NetApp break oracle: flock problem



Hi all!

After upgrading the Linux kernel on our SuSE Linux Enterprise Server (SLES)
to k_smp-2.4.18-224 we were unable to start our Oracle Database which was
located on a NetApp Filer.

All the feedback we got was "no locks available" Further investigation
showed that k_smp-2.4.18-224 introduced a polyserve-flock patch which
probably causes the problems. Has anyone ran into this?

We'll do tests with k_smp-2.4.18-224 and/or 2.4.18-237 whithout the
polyserve-flock patch and share this with those who express their interest.

I address this report to both SuSE and NetApp for it's this combination that
breaks Oracle. Because it's both in SuSE's and NetApp's interest I hope this
report is appreciated though it may not have been sent through the proper
channels.

Alex Harkema
Vertis