Mailing List Archive

qmail-newmrh and NFS give "unable to read controls" error
Hi,

I have a setup of qmail (with Jeremy Kister ISP Patch) and vpopmail
where MX servers share /var/qmail/control/ and /var/qmail/users/ via NFS
(and of course /home/vpopmail/domains/)

When we add a domain via vpopmail (with vadddomain) the "qmail-newmrh"
is call, in these moment on MX servers I have this error on SMTP:

421 unable to read controls (#4.3.0)

I can simulate the error on MX with:

# ./qmail-smtpd
220 mail6.email.net ESMTP
helo aaa.net
250 mail6.email.net
mail from <alessio@ciao.net>
250 ok

[ run qmail-newmrh on vpopmail server]

rcpt to <alessio@cecchi.net>

421 unable to read controls (#4.3.0)

After enable some debug on qmail-smtpd I find that the error is:

errno: Stale file handle

because the file morercpthosts.cdb is "vanish" from NFS.

How to fix?

Thanks

--
Alessio Cecchi
Postmaster @ http://www.qboxmail.it
https://www.linkedin.com/in/alessice
Re: qmail-newmrh and NFS give "unable to read controls" error [ In reply to ]
On 4/27/2020 1:06 PM, Alessio Cecchi wrote:
> 421 unable to read controls (#4.3.0)
[...]
> errno: Stale file handle

I imagine this is a system issue and not really a qmail problem-

can the servers each read /var/qmail/control/morercpthosts and others
like rcpthosts and me ?

try
svc -d /service/qmail*
for dir in /service/qmail-* ; do touch $dir/down ; done
pkill -9 qmail-send qmail-smtpd

# unmount /var/qmail/control
# unmount /var/qmail/users
# mount /var/qmail/control
# mount /var/qmail/users

for dir in /service/qmail-* ; do rm $dir/down ; dne
svc -u /service/qmail*

does it work now?

--

Jeremy Kister
https://jeremy.kister.net./
Re: qmail-newmrh and NFS give "unable to read controls" error [ In reply to ]
Il 27/04/20 19:51, Jeremy Kister ha scritto:
> On 4/27/2020 1:06 PM, Alessio Cecchi wrote:
>> 421 unable to read controls (#4.3.0)
> [...]
>> errno: Stale file handle
>
> I imagine this is a system issue and not really a qmail problem-
>
> can the servers each read /var/qmail/control/morercpthosts and others
> like rcpthosts and me ?

Hi Jeremy and thanks for reply,

the error comes only when I run qmail-newmrh, because a domain was
added, and only for few connections that are open in that moment, for
the rest of the day qmail(-smtpd) is running fine.

//If I run the same test without run qmail-newmrh in the middle the
session works fine://

//# ./qmail-smtpd
220 ////mail6.email.net ESMTP
helo aaa.it
250 ////mail6.email.net
mail from <alessio@ciao.net>
250 ok
rcpt to <alessio@ciao.net>
250 ok
quit
221 //////mail6.email.net////

From the book "Managing NFS and NIS":

/A filehandle becomes stale whenever the file or directory referenced by
the handle is removed by another host, while your client still holds an
active reference to the object. A typical example occurs when the
current directory of a process, running on your client, is removed on
the server (either by a process running on the server or on another
client)./

/I thinks that qmail-newmrh is creating a temp //morercpthosts.cdb
during rebuild and move to the "real" //morercpthosts.cdb when done.
Probably qmail-smtpd running in that moment, and that are evaluating the
rcpt to command is looking for the "old" ///morercpthosts.cdb and return
the error.//

/Thanks
/

--
Alessio Cecchi
Postmaster @ http://www.qboxmail.it
https://www.linkedin.com/in/alessice
Re: qmail-newmrh and NFS give "unable to read controls" error [ In reply to ]
Hi Alessio,

>
> From the book "Managing NFS and NIS":
>
> /A filehandle becomes stale whenever the file or directory referenced by the handle is removed by another host, while your client still holds an active reference to the object. A typical example occurs when the current directory of a process, running on your client, is removed on the server (either by a process running on the server or on another client)./
>
> /I thinks that qmail-newmrh is creating a temp //morercpthosts.cdb during rebuild and move to the "real" //morercpthosts.cdb when done. Probably qmail-smtpd running in that moment, and that are evaluating the rcpt to command is looking for the "old" ///morercpthosts.cdb and return the error.//
>

Even though, the rename operation should be atomic (cross fingers for NFS). The particular code is here (taken from s/qmail)

68 if (cdb_make_finish(&cdb) == -1) die_write();
69 if (fsync(fdtemp) == -1) die_write();
70 if (close(fdtemp) == -1) die_write(); /* NFS stupidity */
71 if (rename("control/morercpthosts.tmp","control/morercpthosts.cdb") == -1)
72 logmsg(WHO,111,ERROR,"unable to move control/morercpthosts.tmp to control/morercpthosts.cdb");
73
74 _exit(0);
75 }

If you have build your qmail from the source code, I simply would double the fsync command

68 if (cdb_make_finish(&cdb) == -1) die_write();
69 if (fsync(fdtemp) == -1) die_write();
69 if (fsync(fdtemp) == -1) die_write(); // once more
70 if (close(fdtemp) == -1) die_write(); /* NFS stupidity */
71 if (rename("control/morercpthosts.tmp","control/morercpthosts.cdb") == -1)
72 logmsg(WHO,111,ERROR,"unable to move control/morercpthosts.tmp to control/morercpthosts.cdb");
73
74 _exit(0);
75 }

and hoping filesystem is clean after that (given async I/0). Potentially a wait could be helpful as well.

Q: Is your filesystem given that mount is very busy? Lots of ongoing I/Os? (iostat and friends tell).

regards.
--eh.


Dr. Erwin Hoffmann | FEHCom | http://www.fehcom.de | PGP Key-Id 7E4034BE