Costin Raiu, <craiu@gecad.ro>
Prologue
According to a string inside it's encrypted body, the Zhengxi virus was written back in 1995, by an unknown person from Russia. Considered being one of the most complex viruses ever, with 3 known versions (7271, 7307 and 7313 bytes long) Zhengxi is still a problem for many of today's AV programs. The purpose of this paper is to show how easy (or complicated) would be adding perfect detection for Zhengxi to a standard AV program.
Introduction
An excellent analysis of the Zhengxi virus was presented by Eugene Kaspersky in the April '96 edition of the security-related publication "Virus Bulletin". The article, entitled "Zhengxi - Saucerful of Secrets" is also available in a reduced form at http://www.avp.ch/, in the AVPVE Online section.
Anyone not familiar with the virus, should read Mr. Kaspersky's analysis first, since most of the data and information presented here are highly technical, and probably hard to understand by a non-advised reader. Considering that you now know what are we dealing with, let us begin...
Fishing With Zhengxi
When I decided to add detection for this virus in my antivirus program, RAV, the first problem encountered was the number of samples. My collection of Zhengxi samples consisted in 9 samples, 3 for each version of the virus. In order to test the detection rate of a product on a polymorphic virus, a large number of samples is usually required, therefore I thought that obtaining around 500 samples should do the job. The main problem was that Zhengxi is one of the hardest viruses to replicate. Special checks are done by the virus in order to avoid any baiting attempt, checks like: looking for MS Windows, testing if the boot drive is not A: or B: and various tests related to the system time. In order to replicate Zhengxi I wrote a set of AWK scripts and some short C programs designed to trick the virus. Note that I could have changed various parts of the virus in order to nullify the anti-baiting checks, but that would have also generated a new version of the virus, and generating new viruses by changing their code with a debug tool is not an ethical approach. That approach would also be highly complicated, since most of the checks in Zhengxi are not easy to understand or found. That's why the other approach was taken...
The automatic replication system was developed using the sources from one of my previous projects, related to the TMC virus. The replication system was designed to run under Windows NT. As strange as it might sound, Windows NT proved to be an excellent environment for this job. A special computer was set up, and an account with special restrictions was created on the NT computer. The account, called "zhengxi" was only allowed to access a special directory on the only partition in the system, formatted with the NTFS file system. The replication system consisted in one master AWK script - I chose AWK because of my past experience with it. I used Rob Duff's free AWK, written in 1991, which shows an excellent stability and compatibility with the AWK specifications. A copy of this program is available from my homepage, at http://homepages.gecad.ro/craiu/. While I could have also used Perl instead, the AWK interpreter is only 47k, while a working Perl installation should be around 1Mb. This main script regularly starts up to 4 different threads targeting 4 different directories. This number was chosen in order to balance the replication rate with the system speed - it might be worth mentioning that the system used to replicate the virus was an Intel Pentium 133Mhz with 64Mb of RAM running Windows NT Server 4.0 + SP3. The main AWK script was responsible for running a special C program - this program scanned the bait files in a specified directory, by opening and performing some common file operations over them. This program was executed from a .BAT file which executed an infected sample before running the C program... A second C program was used to change the time of all the files in the target directories to a random value. This was required in order to bypass some of the checks in the virus.
The second C program was started from time to time by the main AWK script. Each thread was started by the main AWK script using "cmd /c start". When a thread was finished the script was running a third C program - this was responsible for spotting any modification to the bait files. Since detection for the virus was not yet available to me I chose a very simple method: checking the last 4k of the sample for some presets. The 4 dirs holding the Zhengxi samples were filled at startup with 400 trap files, giving a total of 1600 files. Each time a file change was detected, the respective file was moved by third the C program to a special quarantine directory on the system and replaced by a new clean bait file. By running this set of programs for over 9 hrs I got a heap of around 1500 samples of Zhengxi.7313.
Moving To The Next Problem
Having managed to replicate Zhengxi in a large number of samples, I took a short look at the resulting samples. I also scanned the resulting samples with demo or shareware versions of some well known AV programs. The test was performed at the middle of 1997, so the information here might not be relevant. Anyway, at that time only 3 products were able to detect _ALL_ the infected samples. Two products were from Russia, while the other one was from Slovakia. I don't know if other products were able to detect Zhengxi at that time except the 3 programs mentioned above, nor I do care. Regarding the 3 products, one of them was "AVP". This was to be expected, since Mr. Kaspersky wrote the Zhengxi analysis I mentioned above, and because "AVP" is a long-time runner when it comes to detection of most complex polymorphic viruses. The other two were "Dr. WEB" written by Igor Daniloff, and ESET's "NOD-ICE".
So, with 1500 samples on my computer I took a look at the result of the replication process. Debugging some samples was also useful - however, tracing Zhengxi is not a trivial process, or at least it wasn't at that time. The conclusion was that it's almost impossible to write an algorithmic detection for the virus. This was expected thought... At this time I should probably talk a little bit about the code emulator from RAV. The code emulator in our product is called RACE - RAV's Advanced Code Emulator. It was designed and developed by Adrian Marinescu (nicknamed 'Mady') from the RAV team. I'll let him give you a short overview of our engine:
"In the summer of '96, I joined the RAV team. Costin asked me to specifically take care of the new code emulator technology to be used in RAV. Therefore, I started developing a generic code emulator: the C.E. should theoretically have been be able to load program files and emulate them. At that time, the number of polymorphic viruses was rising high, so the code emulator was designed mostly to take care of this problem - by emulating the polymorphic decryption loop, we should reach the plain virus code which can theoretically be detected using conventional search string techniques or checksums.
Therefore, I started developing the code emulator required in order to detect some of the viruses that were in the wild at that time, like the OneHalf and Hare families, and why not, the Romanian polymorph Dumb.4722. No problems were encountered with Hare or Dumb, but OneHalf meant the first big problem: what paging model should I use to catch such viruses, that split their code within the file. After resolving this problem, found out that the technique I've used also worked with Commander_Bomber, another well-known "splitter". This was the biggest problem solved by that time. Before the release of RAV5, Costin came to me with a pocket of infected samples with the Zhengxi.7313 virus. Since I already knew this virus by it's reputation, I quickly scanned the samples using the dedicated scanner I wrote for the various tests I had to do with the emulator. Of course, no files were found infected, (and I wonder if any code emulator was able to detect it using only heuristics, e.g. before optimizing it for Zhengxi, and before adding a definition for it) but some files just made the emulator "hang" for a while, and after that exit with no infection found. When I wrote the code emulator used in RAV, I assumed that no virus's decryptor will be bigger than one million of instructions, using this as as "get out" flag. Some SMEG mutations had about 0,7 millions, and I expected more to come. After I saw ZME, the initial break conditions needed some minor adjustments...
I encountered two big problems while emulating the polymorphic decryptors generated by Zhengxi engine (ZME):
- the interrupt handling routines must take care of the real addresses in memory, as ZME can generate calls to routines using CALL FAR/JMP FAR instructions. Generated interrupts are: 8h, 1Ch, 28h, 2Bh, 11h, 12h and a large number of function from Int21: 18h, 1Dh, 1Eh, 20h, 51h, 62h, 30h, 19h, 2Ah, 2Ch, 36h, 4Dh, 0Dh, 23h, 0Bh, 5Ch, 3Dh, 41h, 4Eh, 3Ah. Some of this function require input registers to be set (dx) and some of them set the carry flag, and the result will be used. Calling this interrupts/functions is done by the virus in 4 distinct ways:
1. pushf
2. pushf
3. push return_offset
4. int no
(As a sidenote, the return_offset is not necessary the IP of the instruction after the far jump. Also, all the code is padded with junk instructions, not to mention that when an address is required, the segment and offset are variable. (not the standard, huge- pointers-like approach as used in DOS C compilers)
Also, a tricky thing is the call of pseudo-INT 30h, the address of INT 21H. Looking over some other mutation engines, older or newer, ZME has some unique features: the variety of the calling methods, the use of some function's carry flag on return.
- the number of instructions quite big, about 20 millions of CPU instructions for each mutation (2 decryptors, the second one containing more junk than the first one). The decryptor is large in size, but the large number of instructions is due to the calls in the decryptor (there are several levels of calls).
Defeating the engine
assumes that the code emulator is able to handle the routine calls. By
resolving this problem, you'll obtain a code emulator able to decrypt the
virus, but due to the number of emulated instructions this will take some
seconds. There are some options in making the detection faster. The most
interesting is the "calls disabling" technique. Assuming that no calls
are emulated, the virus will decrypt correctly, so we have to check if
a ZME generated code is encountered, and just disable the calls. This improvement
makes the time spent in decrypting the virus about 1/10 of the initial
time. Other options are to disable some time-spending (for example the
interrupts) instructions for the emulator, and replace them with some "faster
instructions" (like nop's). Making the optimizations rise another problem:
what if a virus is catalogued by the emulator as being a ZME based, and
is not. This implies that the decision of disabling the calls should be
backed up by some other conditions.
So, for almost 8 months, Mady constantly improved the code emulator in order to correctly decrypt Zhengxi. It is probably not known that in order to decrypt it, one should emulate around 2*10^7 instructions, if no emulation tricks are used. This number of was a little bit to high for us - when allowing RAV to emulate 2^6 instruction the speed decreased so much that the product become highly unusable... But let's see what is Zhengxi doing with 2*10^7 instructions. The main problem of the Zhengxi decryptor were, as Mady pointed before, the CALLs.
The virus performs lots of calls to do-nothing routines. If we disabled the CALL emulation RAV was able to detect around 60% of the Zhengxi samples, but the speed was unfortunately not very good. Also, if one would chose to avoid emulating the Intel CALL instruction, special checks should be performed in order to only do it if Zhengxi-like structures are encountered.
Statistic Analysis
I previously used probabilistic approaches a long time ago, for viruses like MtE.Groove or Evolution - at that time I had no code emulator in RAV so I researched some experimental ways of detecting polymorphic decryptors. Today, such methods are considered primitive - I really don't think that anyone is still using them in real AV programs. However, I also tried such approaches on Zhengxi's encrypted body. This proved to be futile - the polymorphic engine is generating incredibly complex encryption loops, avoiding any linearity that could result in poorly encrypted samples. At this point I thought that detecting Zhengxi without a full emulation is impossible. Until CERF '98... CERF is a Romanian computer fair, hold each summer. This is a nice place to meet the other AV researchers and check out on the competition. The idea came to me suddenly, while I was talking with one of the experts from another Romanian AV program on a totally unrelated problem. 'What if I tried the statistical analysis on the FLOW OF INSTRUCTIONS generated by the code emulator while tracing Zhengxi?' I analyzed this idea for a few days before giving it a try. So I quickly compiled a slightly changed version of RACE, a version that was dumping the instruction flow for each scanned sample. Due to the inner workings of RACE, only the first 700000 instructions were dumped to special debug files. It seemed to me that I was right - after around 100 various instructions the first decryptor was taking control. The decryptor's medium size was around 800 instructions. So I quickly wrote a C++ program to compute a statistical distribution of the instructions in the Zhengxi decryptor. The program's output was very interesting, as the decryptor seemed to be designed only with a reduced set of instructions, which is not the case of the first 100 or so instructions from the Zhengxi sample.
The following map is a view of which instructions are used in the decryptors of 2 randomly chosen samples. I marked the instructions found in the decryptor with '01', while '00' means that the instruction opcode at that offset was not found in the decryptor.
Fig. 2 uses the first 1024 instructions while Fig 1. uses the next 700000 instructions coming after the first 1024.
Fig. 1: (instructions from 1024 to 700000)
00000000: 00 00 01 01-01 00 00 00-00 00 01 01-00 00 00 00 *** **
Fig. 2: (first 1024 instructions)
00000000: 00 00 01 01-01 01 01 01-00 00 01 01-00 01 00 00 ****** ** *
You can easily see that the Zhengxi decryptor is mainly based on 7xh and 8xh opcodes, some instructions from 00 to 30h, and various instructions from B0h to F0h. The first 1k of emulated instructions also includes some low opcodes along with opcodes in the 90h-A0h range.
At this time I considered writing a very basic form of neural network. I had no experience with Neural Networks before, so I had to read some documentation on this matter. The freely available sources from SimTel proved to be an excellent guide, and I recommend them to anyone interested on this subject. The network was trained on 300 Zhengxi instruction flows, each 700k in size. The NN trainer was set to produce an 1k rule-set. I could have only used 256 bytes for this matter, but 1k is a little bit more secure. Also, extending the net configuration data to 4k might provide a better precision, but from the preliminary tests with 1k was over my expectations. Unlike most neural nets, mine only used 32 bits integers, with no floating point operations at all. Today, it is known that the FPU's in some processors might provide better speed than fixed point arithmetic, for example for the generation of fractal sets. However, using no floating point instructions allowed me to write a highly optimized routine, which should also work fast on 386 class computers with no FPU available. Using no floating point instructions also provides an excellent response time, as the routine can easily make over 400000 decisions in one second on a 200Mhz Pentium class processor. Since the casual scan rate of AV programs based on code emulators is around 30-40 files/second on the above mentioned configuration, the NN is really not slowing the scanner at all. In order to achieve 100% detection on Zhengxi samples the code emulator required some modifications in order to correctly execute the first heap of instructions before the first Zhengxi detection routine. Today, the code emulator has also a perfect 100% detection rate if we set it to execute enough instructions - the NN decision could theoretically have been used to extend the number of emulated instructions or to disable the CALL execution in the code emulator. Those are however optimizations for detection between Zhengxi variants. For example, by only using the NN to detect Zhengxi we reach over 20 f/s, which is probably 100 times faster than disabling the calls and extending the number of emulated instructions to a value large enough to allow the virus to completely decrypt itself.
Cleaning The Zhengxi Virus
In order to clean Zhengxi, one should probably use the same method as the one used by the virus to disinfect files while doing it's stealth operations. The random values used to generate the encryption code can be used to generate the decryption functions which should be used to decrypt the original host header. Of course, software emulation of the code generated by the engine is a simpler solution, but extremely slow...
Vulnerabilities Of The Method
Since the above mentioned method is only able to detect the ZME (Zhengxi Mutation Engine) generated decryptors, and that only on a probabilistic basis, one might expect false positives to show up. So far, we got no such problems, and I don't expect such a very unlikely situation to appear. If we actually find a file containing a Zhengxi-like decryptor loop, we could anytime extend the net with a second instance which also parses the first 1000 instructions, and we could also extend the RS (rule-set n.a.) to 4k. However, the current approach is stable enough, and until such a false positive situation appears, we will keep it unchanged.
There is also the other aspect of the Zhengxi virus that is not presented here, and I mean the infection of files via inserting. This special method used by Zhengxi on high level programs, containing standard procedures using the BP register to access the formal arguments, cannot be detected using the method presented in this paper, because the code emulation process must be started on the Zhengxi polymorphic decryption loop. Since locating the entrypoint in the files infected via inserting method is not a trivial thing, we cannot use the approach presented here directly... There are however ways to detect such files too, but I'll keep them for another paper.
Copyright (c) Costin Raiu