Mailing List Archive

[PATCH] reduce memory usage when scanning zip files
Hi,


In the thread
http://lurker.clamav.net/message/20111212.083121.71a17ccd.en.html I
asked about memory usage when scanning a zip file.

I decided to have a look on how to change the behaviour and wrote a
patch for it.
I don't know if this patch is desirable or acceptable but I'm
submitting it anyway.
Patch is against 0.97.3 but should apply cleanly on git master.

To test the patch I tested 3 (large) files:


*File 1*

ZIP file of 477MB which contains 14 files.
The largest file in the zip is a CAB file of 489MB.
This CAB file contains:
* 57079 files that are smaller than 1MB,
* 112 files that are between 1MB and 10MB,
* 9 files that are larger then 10 MB (the largest being 132MB)

Original version:

----------- SCAN SUMMARY -----------
Known viruses: 1096211
Engine version: 0.97.3
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 990.25 MB
Data read: 476.21 MB (ratio 2.08:1)
Time: 103.864 sec (1 m 43 s)


Patched version:

----------- SCAN SUMMARY -----------
Known viruses: 1096215
Engine version: 0.97.3
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 990.25 MB
Data read: 476.21 MB (ratio 2.08:1)
Time: 102.853 sec (1 m 42 s)


*File 2*

The CAB file extract out of the above zip file (489MB).
This CAB file contains:
* 57079 files that are smaller than 1MB,
* 112 files that are between 1MB and 10MB,
* 9 files that are larger then 10 MB (the largest being 132MB)

Original version:

----------- SCAN SUMMARY -----------
Known viruses: 1096211
Engine version: 0.97.3
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 498.80 MB
Data read: 489.75 MB (ratio 1.02:1)
Time: 51.487 sec (0 m 51 s)

Patched version:

----------- SCAN SUMMARY -----------
Known viruses: 1096215
Engine version: 0.97.3
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 498.80 MB
Data read: 489.75 MB (ratio 1.02:1)
Time: 51.564 sec (0 m 51 s)


*File 3*

Zip file of 401MB that contains 1 file, with random data, of 400 MB


Original version:

----------- SCAN SUMMARY -----------
Known viruses: 1096220
Engine version: 0.97.3
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 804.19 MB
Data read: 400.06 MB (ratio 2.01:1)
Time: 81.984 sec (1 m 21 s)


Patched version:

----------- SCAN SUMMARY -----------
Known viruses: 1096220
Engine version: 0.97.3
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 804.19 MB
Data read: 400.06 MB (ratio 2.01:1)
Time: 82.960 sec (1 m 22 s)



Best regards,

Bram
Re: [PATCH] reduce memory usage when scanning zip files [ In reply to ]
Quoting Bram <bram-pynzni-qriry@lists.wizbit.be>:
> I decided to have a look on how to change the behaviour and wrote a
> patch for it.
> I don't know if this patch is desirable or acceptable but I'm
> submitting it anyway.
> Patch is against 0.97.3 but should apply cleanly on git master.

...

It appears the list removed the attachment so here's an inline version of it:

diff -Naur clamav-0.97.3.orig/libclamav/unzip.c
clamav-0.97.3.patched/libclamav/unzip.c
--- clamav-0.97.3.orig/libclamav/unzip.c 2011-01-10 18:48:28.000000000 +0100
+++ clamav-0.97.3.patched/libclamav/unzip.c 2011-12-21
16:53:29.000000000 +0100
@@ -56,7 +56,7 @@
return inflateInit2(a, b);
}

-static int unz(uint8_t *src, uint32_t csize, uint32_t usize, uint16_t
method, uint16_t flags, unsigned int *fu, cli_ctx *ctx, char *tmpd) {
+static int unz(uint8_t *src, uint32_t csize, uint32_t usize, uint16_t
method, uint16_t flags, unsigned int *fu, cli_ctx *ctx, char *tmpd,
fmap_t *map) {
char name[1024], obuf[BUFSIZ];
char *tempfile = name;
int of, ret=CL_CLEAN;
@@ -78,7 +78,7 @@
if(csize<usize) {
unsigned int fake = *fu + 1;
cli_dbgmsg("cli_unzip: attempting to inflate stored file with
inconsistent size\n");
- if ((ret=unz(src, csize, usize, ALG_DEFLATE, 0, &fake, ctx,
tmpd))==CL_CLEAN) {
+ if ((ret=unz(src, csize, usize, ALG_DEFLATE, 0, &fake, ctx,
tmpd, map))==CL_CLEAN) {
(*fu)++;
res=fake-(*fu);
}
@@ -134,17 +134,57 @@

memset(&strm, 0, sizeof(strm));

- *next_in = src;
*next_out = obuf;
- *avail_in = csize;
*avail_out = sizeof(obuf);
if (unz_init(&strm, -wbits)!=Z_OK) {
cli_dbgmsg("cli_unzip: zinit failed\n");
break;
}
- while(1) {
- while((res = unz_unz(&strm, Z_NO_FLUSH))==Z_OK) {};
- if(*avail_out!=sizeof(obuf)) {
+
+ int chunk_size = BUFSIZ;
+ uint32_t remaining = csize;
+
+ do {
+ /* 'read' a chunk of data */
+ if (remaining >= chunk_size) {
+ if (fmap_need_ptr_once(map, src, chunk_size)) {
+ *avail_in = chunk_size;
+ *next_in = src;
+
+ src += chunk_size;
+ remaining -= chunk_size;
+ }
+ else {
+ cli_dbgmsg("cli_unzip: fmap_need_ptr_once failed?!\n");
+ }
+ }
+ else {
+ if (remaining == 0) {
+ /* corrupted file? zlib has not detected a STREAM_END and expects
more but all input is consumed... */
+ cli_dbgmsg("cli_unzip: entire input stream used but zlib expets
more...\n");
+ res = Z_DATA_ERROR;
+ break;
+ }
+ else {
+ if (fmap_need_ptr_once(map, src, remaining)) {
+ *avail_in = remaining;
+ *next_in = src;
+
+ src += remaining;
+ remaining -= remaining;
+ }
+ else {
+ cli_dbgmsg("cli_unzip: fmap_need_ptr_once failed?!\n");
+ }
+ }
+ }
+
+ /* inflate the data chunk */
+ do {
+ *next_out = obuf;
+ *avail_out = sizeof(obuf);
+
+ res = unz_unz(&strm, Z_NO_FLUSH);
written+=sizeof(obuf)-(*avail_out);
if(ctx->engine->maxfilesize && written > ctx->engine->maxfilesize) {
cli_dbgmsg("cli_unzip: trimming output size to maxfilesize
(%lu)\n", (long unsigned int) ctx->engine->maxfilesize);
@@ -157,12 +197,9 @@
res = 100;
break;
}
- *next_out = obuf;
- *avail_out = sizeof(obuf);
- continue;
- }
- break;
- }
+ } while (*avail_out == 0);
+ } while (res != Z_DATA_ERROR && res != Z_STREAM_END && res != 100);
+
unz_end(&strm);
if (res == Z_STREAM_END) res=0;
break;
@@ -384,8 +421,7 @@
if(LH_flags & F_ENCR) {
cli_dbgmsg("cli_unzip: lh - skipping encrypted file\n");
} else {
- if(fmap_need_ptr_once(map, zip, csize))
- *ret = unz(zip, csize, usize, LH_method, LH_flags, fu, ctx, tmpd);
+ *ret = unz(zip, csize, usize, LH_method, LH_flags, fu, ctx, tmpd, map);
}
zip+=csize;
zsize-=csize;


_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net