Mailing List Archive

[clamav-users] ClamScan does how much of this (heuristical analysis/sandboxes)?
To better secure us, would future versions of Clamscan add artificial neural networks (artificial CNS) to virus scanners?

Github has lots of FLOSS (Open Source Softwares) simulaters of CNS (at https://github.com/topics/artificial-neural-network , such as https://github.com/Rober-t/apxr_run/ ), which virus scanners could use to do this:

Just have training data inputs = samples of infected files/programs, and outputs = original files/programs (or null if no fresh programs to revert to), to produce artificial CNS to undo infections from files/programs.

Assume most antivirus programs have heuristical analysis and sandboxes, but if not here is how to do this:
Search for open source (FLOSS) virus scanners (https://github.com/topics/virus-scanner has lots) and look at how those scan executables to figure out what programs do to your OS.

Most look for OS opcodes (such as “int” or “syscall”) or look at what libraries the programs load and search for instructions such as “jmp” or “call” that goto system libraries, to flag programs that alter other programs and flag programs that alter page flags to have W+X (lots of malware alters pages to have both writable and executable access, so virus scanners block such programs)

To figure out what libraries a program loads,

refer to the specifications of the OS’s executable format -- “Portable Executable” for Windows ( https://learn.microsoft.com/en-us/windows/win32/debug/pe-format https://wikipedia.org/wiki/Portable_Executable ), “Extended Linker Format” for most others such as UNIX and Linuxes ( https://wikipedia.org/wiki/Executable_and_Linkable_Format )

which would allow you to know what libraries a program loads at startup, plus those libraries’ functions’ addresses.
 
Virus scanners should also look at dynamic loads of functions ( https://www.codeproject.com/Questions/338807/How-to-get-list-of-all-imported-functions-invoked ) such as from GetProcAddress, or just flag functions such as GetProcAddress

For virus scanners to have better heuristical analysis, should flag programs that perform raw network accesses (versus OS functions to download/upload files),

or that alter files of which the program is not the owner.

Some of this requires that you not just look at what functions the program calls,

but also look at (if just constant parameters) or guess (if registers/addresses as parameters, antiviruses should use sandboxes or just flag all non-constant parameters to sensitive functions) what parameters the program passes to those functions.

Example outputs from Fdroid through sandbox/analysis:
https://www.virustotal.com/gui/file/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75
https://www.virustotal.com/ui/file_behaviours/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75_VirusTotal [https://www.virustotal.com/ui/file_behaviours/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75_VirusTotal%20R2DBox/html]R2DBox[https://www.virustotal.com/ui/file_behaviours/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75_VirusTotal%20R2DBox/html]/html[https://www.virustotal.com/ui/file_behaviours/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75_VirusTotal%20R2DBox/html]
https://www.virustotal.com/ui/file_behaviours/dc3bb88f6419ee7dde7d1547a41569aa03282fe00e0dc43ce035efd7c9d27d75_Zenbox/html

The false positive outputs (from Zenbox) show the purpose of manual reviews for programs that your sandbox flags.

For the scanners with heuristical analysis and sandboxes,
the next logical move is to add artificial CNS.

Not all scanners have such analysis and scanners,
but clear that some such as Virustotal do, as the Urls show us.

Updates: adds howto use chroot/strace for sandboxes, and LLVM static analysis.
Earlier urls show Virustotal's heuristical analysis of Fdroid's package manager and review its behaviour through two sandboxes.

A POSIX OS such as Linux has "chroot()" (run `man chroot` for instructions) so that the programs you test cannot alter stuff out of the test,
and has "strace()" (run `man strace` for instructions, or look at https://opensource.com/article/19/10/strace
https://www.geeksforgeeks.org/strace-command-in-linux-with-examples/ ) which hooks all system calls and saves logs,
simple sandboxes just launch programs for a few seconds and dump such logs,
with additional heuristics to guess which calls should go to logs (so reviewers have to look through less.)

Autonomous sandboxes use the ideas from first post, to flag programs that do system calls that would alter resources that are not part of the program under test.

Heuristical analysis is similar to Clang/LLVM's static analysis tools (static analysis checks programs for accidental security threats such as bufferr over-runs/under-runs,) and you could use the FLOSS static analysis tools as a first basis for virus scanners, just add checks for deliberate security threats and flag those for manual reviews and warn not to run such programs before the reviews.
https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer is part of LLVM and is a FLOSS basis for analysis, has uses for virus scanners.
if you don't want LLVM, https://github.com/secure-software-engineering/phasar has just the analysis.

As for the artificial neurons/CNS,
those are as simple to use for this as the original post says.
If you want, would not require too much effort to do this,
but who has access to large sample databases for the artificial CNS?
Examples of howto setup APXR as artificial CNS; https://github.com/Rober-t/apxr_run/blob/master/src/examples/
Examples of howto setup HSOM as artificial CNS; https://github.com/CarsonScott/HSOM/tree/master/examples
Simple to setup once you have access to databases.
Just as (if humans grew trillions of neurons plus thousands of layers of cortices) one of us could pour through all databases of infections (plus samples of fresh programs) to setup our synapses to revert (from hex dumps) all infections to fresh programs,
so too could artificial CNS with trillions of artificial neurons do this.

From howtos ( https://silvercross.quora.com/Howto-produce-better-virus-scanners-through-heuristical-analysis-sandboxes-plus-artificial-CNS-Search-for-open-source
https://open.substack.com/pub/swudususuwu/p/howto-produce-better-virus-scanners )