Mailing List Archive

[Bug 2834] Directories not removed after av_scan
https://bugs.exim.org/show_bug.cgi?id=2834

--- Comment #10 from Piotr Staszeski <tech@axit.pl> ---
Last update
The issue still exists in newest version 4.96.2. We were mange to investigate
it deeper and it turned out that the issue is related with 'silly renamed'
mechanism in NFS.

Long story short, we were monitoring the spool directory using:
while :; do date; DIR=`date +%s%N`; find /opt/smtp_data/data/spool/scan/ -type
f -exec mkdir copy/${DIR} ';' -exec cp -v '{}' copy/${DIR} ';'; usleep 100;
done
and here is the output:

Wed Oct 18 14:22:05 CEST 2023
‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/1qt5Yr-00074v-3g.eml’ ->
‘copy/1697631725208102956/1qt5Yr-00074v-3g.eml’
Wed Oct 18 14:22:05 CEST 2023
‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/.nfs00000000000200b300000016’
-> ‘copy/1697631725238316816/.nfs00000000000200b300000016’
Wed Oct 18 14:22:05 CEST 2023
‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/.nfs00000000000200b300000016’
-> ‘copy/1697631725267061483/.nfs00000000000200b300000016’
Wed Oct 18 14:22:05 CEST 2023
‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/.nfs00000000000200b300000016’
-> ‘copy/1697631725300034499/.nfs00000000000200b300000016’
Wed Oct 18 14:22:05 CEST 2023
‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/.nfs00000000000200b300000016’
-> ‘copy/1697631725334815226/.nfs00000000000200b300000016’
Wed Oct 18 14:22:05 CEST 2023
‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/.nfs00000000000200b300000016’
-> ‘copy/1697631725361270356/.nfs00000000000200b300000016’
Wed Oct 18 14:22:05 CEST 2023
‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/.nfs00000000000200b300000016’
-> ‘copy/1697631725388518038/.nfs00000000000200b300000016’
Wed Oct 18 14:22:05 CEST 2023
‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/.nfs00000000000200b300000016’
-> ‘copy/1697631725418048836/.nfs00000000000200b300000016’
Wed Oct 18 14:22:05 CEST 2023
‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/.nfs00000000000200b300000016’
-> ‘copy/1697631725446052356/.nfs00000000000200b300000016’
Wed Oct 18 14:22:05 CEST 2023

For test purpose, we've tried to add sleep 2sec (in spool_mbox.c) after removed
mail, but before removing directory itself.. but without any success. So. the
question is why this issue is not present in 4.94.2 ? And how we can fix it, to
be able to using nfs as a storage ?

--
You are receiving this mail because:
You are on the CC list for the bug.

--
## subscription configuration (requires account):
## https://lists.exim.org/mailman3/postorius/lists/exim-dev.lists.exim.org/
## unsubscribe (doesn't require an account):
## exim-dev-unsubscribe@lists.exim.org
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: [Bug 2834] Directories not removed after av_scan [ In reply to ]
On Fri, 20 Oct 2023, Exim Bugzilla via Exim-dev wrote:

> https://bugs.exim.org/show_bug.cgi?id=2834
>
> --- Comment #10 from Piotr Staszeski <tech@axit.pl> ---
> Last update
> The issue still exists in newest version 4.96.2. We were mange to investigate
> it deeper and it turned out that the issue is related with 'silly renamed'
> mechanism in NFS.
>
> Long story short, we were monitoring the spool directory using:
> while :; do date; DIR=`date +%s%N`; find /opt/smtp_data/data/spool/scan/ -type
> f -exec mkdir copy/${DIR} ';' -exec cp -v '{}' copy/${DIR} ';'; usleep 100;
> done

Am I right in thinking that you are running this on the client,
not the file-server ?
If the spool and the copy are on the same file-system,
does a hard-link `ln` instead of the `cp -v` give you new information ?
It should also be quicker.

Can you run fuser (or perhaps fuser -v) in the monitor script, just to
confirm that it is exim (and not the scanner) holding the files open ?


> and here is the output:
>
> Wed Oct 18 14:22:05 CEST 2023
> ‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/1qt5Yr-00074v-3g.eml’ ->
> ‘copy/1697631725208102956/1qt5Yr-00074v-3g.eml’
> Wed Oct 18 14:22:05 CEST 2023
> ‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/.nfs00000000000200b300000016’
> -> ‘copy/1697631725238316816/.nfs00000000000200b300000016’

> Wed Oct 18 14:22:05 CEST 2023
> ‘/opt/smtp_data/data/spool/scan/1qt5Yr-00074v-3g/.nfs00000000000200b300000016’
> -> ‘copy/1697631725446052356/.nfs00000000000200b300000016’
> Wed Oct 18 14:22:05 CEST 2023
>
> For test purpose, we've tried to add sleep 2sec (in spool_mbox.c) after removed
> mail, but before removing directory itself.. but without any success. So. the
> question is why this issue is not present in 4.94.2 ? And how we can fix it, to
> be able to using nfs as a storage ?

I take it you have a good reason to use a (remote) NFS file system
for what is working storage ? I have known people who put the spool
on battery-backed RAM rather than have the delay of spinning rust here.
Of course if your scan happens on another machine you *may* have no
choice.

Of course, this does not explain why something changed after 4.94.2.

----
A quick google for ".nfs files" tells me

http://nfs.sourceforge.net/#faq_d2

NFSv4.1 will get away from this behavior with
OPEN4_RESULT_PRESERVE_UNLINKED:

http://tools.ietf.org/html/rfc5661#section-18.16.

- is NFS v4.1 an option ?

--
Andrew C. Aitchison Kendal, UK
andrew@aitchison.me.uk

--
## subscription configuration (requires account):
## https://lists.exim.org/mailman3/postorius/lists/exim-dev.lists.exim.org/
## unsubscribe (doesn't require an account):
## exim-dev-unsubscribe@lists.exim.org
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/