Hi,
I'm having a 'problem' with scanning a large zip file.
What appears to be happening is that clamav loads the entire file into
memory (RSS not VSZ) and then unzips the file.
The increase in VSZ memory is expected since the file is mmap'ed; the
increase in RSS memory is/was not expected (by me).
When scanning a .bz2 or a .gz file then this does not happen. The VSZ
increases but the RSS does not (or at least not that much).
My test:
$ dd if=/dev/urandom of=test_400rand.bin bs=1M count=400
$ zip zip test_400rand.bin.zip test_400rand.bin
$ bzip2 --keep test_400rand.bin
$ gzip test_400rand.bin
The resulting files:
- test_400rand.bin: 400 MB
- test_400rand.bin.zip: 401 MB
- test_400rand.bin.bz2: 402 MB
- test_400rand.bin.gz: 401 MB
Scanning with clamav version 0.97.3 (compiled from source):
$ clamscan --max-filesize=500000000 --max-scansize=4000000000
--scan-archive=yes test_400rand.bin.bz2
Start: VSZ: 114MB, RSS: 105MB
Scanning: VSZ: 520MB, RSS: 114MB (mmap of archive file)
Scanning-2: VSZ: 917 MB, RSS: 119 MB (mmap of extracted file)
$ clamscan --max-filesize=500000000 --max-scansize=4000000000
--scan-archive=yes test_400rand.bin.gz
Start: VSZ: 115MB, RSS: 105MB
Scanning: VSZ: 515MB, RSS: 114 MB (mmap of archive file)
Scanning-2: VSZ: 916 MB, RSS: 121 MB (mmap of extracted file)
$ clamscan --max-filesize=500000000 --max-scansize=4000000000
--scan-archive=yes test_400rand.bin.zip
Start: VSZ: 113MB, RSS: 105MB
Scanning: VSZ: 515MB, RSS: 111MB (mmap of archive file)
Scanning-2: VSZ: 515MB, RSS: 506MB (???)
Scanning-3: VSZ: 916MB, RSS: 514MB (mmap of extracted file)
I looked at the source code and the increase in (RSS) memory happens
in libclamav/unzip.c in the function 'lhdr':
if (!csize) { /* FIXME: what's used for method0 files? csize or
usize? Nothing in the specs, needs testing */
cli_dbgmsg("cli_unzip: lh - skipping empty file\n");
} else {
if(zsize<csize) {
cli_dbgmsg("cli_unzip: lh - stream out of file\n");
fmap_unneed_off(map, loff, SIZEOF_LH);
return 0;
}
if(LH_flags & F_ENCR) {
cli_dbgmsg("cli_unzip: lh - skipping encrypted file\n");
} else {
if(fmap_need_ptr_once(map, zip, csize)) {
*ret = unz(zip, csize, usize, LH_method, LH_flags, fu,
ctx, tmpd);
}
}
zip+=csize;
zsize-=csize;
}
The call to 'fmap_need_ptr_once' causes an increase in the RSS memory
of about 400MB... (csize = 419.498.208, file size = 419.498.372)
Questions:
Is it expected that the RSS memory increases with approximately the
size of the zip file before extracting it?
Is this necessary? (Tools such as unzip are able to decompress the
file without loading the entire file into memory).
What is the purpose of fmap/fmap_need_ptr_once?
Best regards,
Bram
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net
I'm having a 'problem' with scanning a large zip file.
What appears to be happening is that clamav loads the entire file into
memory (RSS not VSZ) and then unzips the file.
The increase in VSZ memory is expected since the file is mmap'ed; the
increase in RSS memory is/was not expected (by me).
When scanning a .bz2 or a .gz file then this does not happen. The VSZ
increases but the RSS does not (or at least not that much).
My test:
$ dd if=/dev/urandom of=test_400rand.bin bs=1M count=400
$ zip zip test_400rand.bin.zip test_400rand.bin
$ bzip2 --keep test_400rand.bin
$ gzip test_400rand.bin
The resulting files:
- test_400rand.bin: 400 MB
- test_400rand.bin.zip: 401 MB
- test_400rand.bin.bz2: 402 MB
- test_400rand.bin.gz: 401 MB
Scanning with clamav version 0.97.3 (compiled from source):
$ clamscan --max-filesize=500000000 --max-scansize=4000000000
--scan-archive=yes test_400rand.bin.bz2
Start: VSZ: 114MB, RSS: 105MB
Scanning: VSZ: 520MB, RSS: 114MB (mmap of archive file)
Scanning-2: VSZ: 917 MB, RSS: 119 MB (mmap of extracted file)
$ clamscan --max-filesize=500000000 --max-scansize=4000000000
--scan-archive=yes test_400rand.bin.gz
Start: VSZ: 115MB, RSS: 105MB
Scanning: VSZ: 515MB, RSS: 114 MB (mmap of archive file)
Scanning-2: VSZ: 916 MB, RSS: 121 MB (mmap of extracted file)
$ clamscan --max-filesize=500000000 --max-scansize=4000000000
--scan-archive=yes test_400rand.bin.zip
Start: VSZ: 113MB, RSS: 105MB
Scanning: VSZ: 515MB, RSS: 111MB (mmap of archive file)
Scanning-2: VSZ: 515MB, RSS: 506MB (???)
Scanning-3: VSZ: 916MB, RSS: 514MB (mmap of extracted file)
I looked at the source code and the increase in (RSS) memory happens
in libclamav/unzip.c in the function 'lhdr':
if (!csize) { /* FIXME: what's used for method0 files? csize or
usize? Nothing in the specs, needs testing */
cli_dbgmsg("cli_unzip: lh - skipping empty file\n");
} else {
if(zsize<csize) {
cli_dbgmsg("cli_unzip: lh - stream out of file\n");
fmap_unneed_off(map, loff, SIZEOF_LH);
return 0;
}
if(LH_flags & F_ENCR) {
cli_dbgmsg("cli_unzip: lh - skipping encrypted file\n");
} else {
if(fmap_need_ptr_once(map, zip, csize)) {
*ret = unz(zip, csize, usize, LH_method, LH_flags, fu,
ctx, tmpd);
}
}
zip+=csize;
zsize-=csize;
}
The call to 'fmap_need_ptr_once' causes an increase in the RSS memory
of about 400MB... (csize = 419.498.208, file size = 419.498.372)
Questions:
Is it expected that the RSS memory increases with approximately the
size of the zip file before extracting it?
Is this necessary? (Tools such as unzip are able to decompress the
file without loading the entire file into memory).
What is the purpose of fmap/fmap_need_ptr_once?
Best regards,
Bram
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net