Bottom of the Class
Costin Raiu, Kaspersky Labs,
<craiu@pcnet.ro>
Prologue
"Unpack the source, and look for a scanstring", they said. When
the first Office97 macro virus was reported, a couple of years ago, the
antivirus researchers all around the world rushed to their labs and put
all those hacking tools out of the box once again in a quest to uncover
the misteries of yet another virus-friendly Office platform. Those of us
who had the chance to be able to ask the all-mighty Microsoft for some
hints regarding the new file format got a rather informative and powerfull
answer, which I actually used to start my article.
Well, because of at least two different reasons, not everyone
followed that suggestion. For example, I was between the unlucky crowd,
and didn't got any hint from Microsoft. That's why I first implemented
something which nowadays might sound silly. More precisely, after looking
at some different samples, I noticed that in all the samples the compressed
source looked exactly the same, so I picked a scanstring from the compressed
source and then used it to detect the respective virus. However, after
some days of success, the method proved to be sooo wrong. I got a sample
of the same virus, in which unfortunately the compressed source looked
different. Right now I know that was caused by different parameters in
the ATTRIBUTE statements of the compressed macro source, but back then
it only ment more trouble to me. So after even more days, with a little
bit of help, I implemented a decompressor, unpacked the source, removed
the "ATTRIBUTE" lines, then computed a CRC on the macro. This worked like
a charm, until another problem appeared. In order to find the offset of
the compressed macro source inside the module, I went to offset 0xD0 in
the macro module, read the value stored in there, then seek to that offset,
and unpacked the data found at that location. As I was saying, another
problem appeared when I got a sample in which the offset to the compressed
macro source was not stored at offset 0xD0 in macro module! Again, resuming
the hacking work, I went to the stream named "dir" inside the macro storage,
uncompressed it using the routine I previously used to decompress the macro
itself, as the compression algorithm was the same, parsed the contents
of the "dir" stream and managed to obtain a reliable way of getting the
offset to the compressed source.
At this point, I thought I had a pretty good detection engine,
which was able to parse all the samples I had, including all the new things
I was receiving, so I thought this would be the end of the story. However,
as I should have expected, when everything seemed fine, yet another problem
appeared. Incredible may it sound, yes, but inside Office97 files, the
actual code that was executed by the Office Visual Basic interpreter was
not the one from the compressed source, but something else, which looked
very much like the opcodes of Excel95 macros, the so-called "pcode". So
one may very well wipe the compressed macro source, and this virus (or
the macro itself, if it is not viral) would still work. Moreover, the compressed
source will be regenerated dinamically from the "pcode". So once again,
my so good looking engine proved to be incomplete, more precisely it was
not detecting the virus, but a "shadow" of itself, which may very well
be missing from the module. Blessing the wisdom of the wizards from
Richmond, I once again went to the hacking tools, gave a couple of silent
thanks to a guy named "SEN" from Russia, and with some extra help from
some friends I wrote a parser for the Office macro modules, managed to
write a "pcode" parser, and eventually managed to obtain some kind of detection
for macro viruses using the thing which seemed to be the real form of the
macro virus, I mean the "pcode".
All was fine, until a couple of time ago, when an angry customer
sent me a sample for a Laroux variant which had the "pcode" and the source
wiped out, but still, the thing was able to drop the "pltd.xls" template
in the Excel startup directory. So I learned that in some cases, the virus
is not executed from the "pcode", it is not executed from the compressed
source either, and actually what is executed (if present) is something
called the "execodes" form of the virus, which is stored in a couple of
streams with names of the form "__SRP_x". Again, frustrated with my current
macro engine code and blessing the wisdom of the wizards from Richmond
I coded a small routine to detect that particular case caused by an antivirus
product which I will not name here, and hoped that someday I'll be able
to implement a proper detection routine for macro viruses...
So, what exactly is executed after all? The answer, which as
a friend of mine would say, it's not so simple, and is quite tricky. For
example, given a sample of a Office97 macro virus which contains the execodes,
when loaded in Office97 the execodes will be executed. If the respective
sample will be loaded in Office2000, the compressed source will be the
originator of the code which gets executed. However, if we have a sample
without execodes, but with valid "pcode" and source, and the sample and
the platform are directly compatible, I mean, they are both Office97 or
Office2000, the "pcode" will get executed. Confused? You bet!
The main problem which results from this paradox of having 3
totally different forms of a VBA macro is of course related to detection
of macro viruses. For example, some antivirus products have implemented
detection of macro viruses using the compressed source. This method had
the advantage of compatibility, meaning that the same code can be used
both for Office97 and Office2000 without any code changes. So far so good.
But what if the source is trashed in a macro? Unless loaded in a different
Office version that the one used to create it, the macro will work perfectly,
and Office would not even report an error. So, other antivirus programs
have implemented detection using the "pcode", which caused some slight
compatibility problems of converting the "pcodes" from Office2000 to '97
and so on. Now, regarding execodes, I'm currently aware of only a few products
which are able to detect viruses this way. And all of them do this only
for a very restricted set of viruses...
The Class.EZ virus
Some Monday morning I received a new macro virus. Why are always
Mondays historically associated with problems? Because this virus for sure
looked like trouble.
The problem revealed itself when after replication I run my CRC
extraction tool, and the tool reported an error while parsing the "pcode"
in the sample. Interesting enough, the source-based CRC extraction part
completed successfully. At this point, I had no suspicious thoughts, and
I cursed again all those secret and undocumented opcodes which were probably
tricking my parser. However, more trouble came ahead when I run a specific
tool called F-VBACRC, which I use to report and classify new viruses
as part of my participation on an international macro virus discussion
forum. Out of the 10 different plug-in modules used by the tool, only 3
of them provided some output. With some dark thoughts, I took the virus
source which I previously briefly analyzed, and proceeded to some real
research.
The description
W97M/Class.EZ is a class infector, quite similar to thousand of
other macro viruses I've seen before. The tricky part is however an executable
stored inside the virus, named "kloop.exe" which is dropped during replication
and subsequently run on the sample which is currently infected by the virus.
If the virus itself is not very complicated, this executable took some
considerable effort to figure out. Analyzing the samples infected with
this virus, the first thing I've noticed was that the VBA project version
was 0x89 in one of the samples, 0xa8 in other sample, and 0xe1 in the last
sample. Usually, for Office97 macros, this is 0x5e, but apparently
something messed this version number during the replication of the virus.
Rather odd... Also, I've noticed that all the samples had an invalid "pcode"
line table, which apparently looked like it was wiped with zeroes.
Since the executable inside the virus seemed the only reasonable
explanation, I blew the dust off my old IDA installation, and started analyzing
the file. The Win32 executable "kloop.exe" first attempts to open the file
provided as a command line parameter, then it will initialize the internal
library random number generator. Next, it determines the size of
the file provided as input, allocates a chunk of memory, and reads the
file in there. Next comes the ugly part. The "patching" component of the
virus searches for two particular scan strings, both 4 bytes long, and
sets a random value for the VBA project version of the document, then also
wipes 24 bytes from the start of the "pcode" line table. Usually this is
enough to make the "pcode" invalid, and force Office to load the source
instead of the "pcode". Quite ugly, should I say. On the other side, the
method used to patch the respective data structures is rather brutal -
no parsing of the OLE2 file is performed, and the method is also likely
to cause a lot of problems, even damaging documents during this operation.
Eventually, the executable writes back the stream to the document, and
exits.
From this point, there is less to be said about the virus. It
doesn't even delete "c:\kloop.exe", and doesn't care to hide its tracks
by wiping the "c:\kloop.dat" image of the macro which is used during replication.
If it weren't for the Win32 executable inside the file this would have
been a rather uninteresting and ordinary virus.
The solution
Hopefully, nowadays many AV products have the ability to scan
the source as well for detecting a macro virus. Also, since the virus doesn't
wipe the entire line table, the remaining few entries can be used in order
to extract some PCODE which can later be used to detect the virus. So from
the detection point of view the virus will not pose a big problem for most
of the products. However, I still wonder how was the author able to figure
out all the informations required to write the virus, and if he really
did this to cause problems to the scanners which happen to use the "pcode"
in order to detect macro viruses. I mean, this kind of information
is extremely hard to obtain, and I cannot imagine a virus author figuring
all those tricky formats all by himself. On the other side, the patching
method is very silly, and if the author knew so many things about OLE2
and VBA macros why didn't he use OLE functions to patch the respective
streams instead of brute-force scanning the OLE2 file for signatures?
Maybe he/she dissasembled some particular macro engine, and tried
to implement some changes in the OLE2 file to avoid detection? Lots of
questions, very few answers...
(c) 2000, Virus Bulletin Ltd.