Such a weird processor - messing with x86 opcodes... and a little bit of PE [Portable Executable]

So welcome. ...And especially let me know if I speak too quickly. Um, so -- who I am -- oh, yes so

I will talk about opcodes and a little bit about the PE [portable executable] file format and their oddities. So, I've been

a reverse engineer for some years, for some time. I created a project called Corkami.

Also in the past I worked on the MAME arcade emulator, and professionally I am a malware analyst, but

this is only on the behalf of my hobbies, this is my own experiments and research at home.

So, I introduced Corkami. Corkami is just the name of the project I created for RCE project.

I tried to keep it just to the technical stuff, no ads, no login required.

Really direct to the good stuff.

I try to update it and make it useful, so I also created cheat sheets and the kind of easy documents

that I would use for work on a daily basis,

but it's only a hobby; I do that once the kids are asleep

and late at night so it's probably doesn't look professional

and as good as I would like it to be.

So right now, Corkami, the form of Corkami, is wiki pages and cheat sheets

and I focus on creating as many as possible relevant proof of concepts [Hi Bob!]

so the binaries are hand-written, usually I don't use a compiler, I create the PE (structure) myself

so that it's only focusing on the exact interesting point

and you don't have a lot of noise even -- you don't probably

need IDA to actually understand what's going on

because I try to focus only on what's important.

The binaries are all directly available to download so you can

really test your debugger, your tools, your knowledge

and just get them directly from that.

So far, I've focused on the PDF, assembly and the PE..

...file format. A few other stuff, but that's mainly the most

covered subject of my website. And I share that with a

very permissive license so BSD you can reuse them commercially

whatever. Even the images are done in open-source format.

So the story behind this presentation is that some time ago

I was young and innocent and I thought that CPUs, being

electronic - whatever - they had to be perfectly logical and no problems

and then I was tricked by malware. And basically

IDA wasn't able to work on it, so I decided to go back

to the basics and study assembly and PE files from scratch.

I created in the meantime documents on Corkami

and now I'm presenting you more or less the final results.

or the good programs results. If I wasn't -- if I was just a

guy who learned assembly I probably wouldn't be in HashDays

to talk about it, if I didn't get a few achievements from

various tools. So basically I failed all the disassemblers that I tried

and I also created a few crashes - in IDA. I insist that all

the authors were notified and most of the bugs are already fixed, but

basically it was like this in 6.1 -- you get a direct crash -- but

now it's fixed in 6.2, and everything.

And Hiew [Hacker's view] - that's the latest version - but the newest and released,

- well, the newest beta - fixed that and so on.

So the agenda for the presentation is that I first try with

an easy introduction, but I assume that most of you already know or are familiar with disassembly, right?

Yes. And another question: are you all familiar with

or you already had an event of undocumented disassembly in your ... or never?

Like, you trust IDA and that's all.

Like, is it a common thing to have an undocumented disassembly in IDA?

Raise you arms -- okay, not so much.

Okay. So then after the introduction (that will go quickly),

I will mention a few tricks, then introduce CoST, the program that I created.

And I will also talk a little bit more about the PE file format.

So as you all have assembly knowledge I will go quickly on that.

So basically, you compile a binary, there is assembly, there is

some relevance, some common points between the [source] code and the assembled [generated] code.

Then of course there is a relation between the opcode and the [assembly] code, you all know that.

What is important is that the assembly is generated by the compiler, but actually what is

then from the assembly what is -- what's only kept in the binary are the opcodes itself which are understood

directly by the CPU, which means the CPU just knows

what to do with the bytes, it doesn't care if you or the

tool you're using know what it will do, because it just does it.

And the problem is that what we read is not usually the opcodes for most people but actually the disassembly

and if the disassembler doesn't give you any result, well,

we're stuck, we're blind, we don't know what execution will do.

And the other problem is because of the opcode length you

don't know what the next instruction will be because you

don't know how to disassemble it.

So, here I just create one undocumented opcode in a simple program.

So basically we just '_emit' -- [it's] a keyword in -- that's Visual Studio 2010 ultimate --

you will get a byte that is unidentified at disassembly

so you get question marks, so basically this program

even though it costs several thousand dollars is not able

to -- it doesn't know what will happen.

So usually if you do that... Oh, yeah, if you check the Intel documentation

there is nothing to see at the D6 opcode, there is nothing to see there.

Microsoft doesn't say anything, Intel doesn't say anything,

so usually if you try that you could expect bad results.

So, not documented, directly: usually it is a crash or not the expected result.

But here, in this case, this specific case, no problem.

We don't know what is was, if we follow Intel or Microsoft documentation, we don't know what happened.

But if we -- the CPU just does its stuff. So what happened is that actually

D6 is a very simple opcode, that doesn't do much, but somehow it's not documented by Intel

[but] it's documented by AMD, and most of the opcodes are actually documented by AMD

but not Intel. I don't know why, if anyone has any idea why...

It's quite a trivial opcode, but it's not -- Intel still says there's nothing there. Okay.

So it's commonly used, the common use for those undocumented opcodes are malware

and packers, just to prevent automated analysis or easy reverse-engineering.

What's funny is that, Intel, if you follow the documentation you will have many holes, but Intel's own disassembler,

Xed, which is free of use, it is not open source, but just handles

all these opcodes correctly, while Microsoft, and Visual Studio, and WinDBG, they follow blindly the documentation.

So you will get question marks even though Intel knows perfectly what it does.

So it's like "[...] do as I disassemble and don't read my documentation."

So - of course - you could argue that WinDBG is only made to debug what the compiler,

Microsoft compiler created, but then it kind of rules out WinDBG as a malware debugging tool,

because you just inserted D6, it's trivial, and WinDBG is just not able to tell you what the instructions

are. So it's not very useful for malware analysis -- for a malware analysis debugger

So, another problem that happens is that of course each of the

undocumented things, facts, are available, maybe one

you will have in a trojan, one in a packer, and everything, but it's not so easy

to find a good, exhaustive, clean test set to actually

gather all these undocumented facts, so for example if you

so, for example, someone says - a colleague - mentions an undocumented

opcode or behaviour, and then you say "oh yeah, it's

in MebRoot [MBR infector], or you skip this part of the file or whatever",

and then you are actually, you know first it's a malware so you have -- you cannot

really spread that, and then there is a lot of noise -- the malware payload or something before and

after -- so it's not so easy to analyse. So that's why I focused on creating a small and clean test

set that would actually provide --- insists just on one particular instruction or fact.

So, now let's start, at last, the real stuff, and a few of the undocumented opcodes.

But before I actually started [studying], [I was] wondering what the actual possibilities of the CPUs, I didn't even know

what are the possibilities, what are the opcodes that are still supported or not by the -- by the CPU.

And I think it's a bit like English, everybody, or most people in the world, would be able to read and

understand these words, and if you['ve] see[n] some disassembly [before] then well you are used to seeing these opcodes,

they are made by all the compilers and they are so common that if they are not here then we are a bit

ill-at-ease, and if it's something different then we probably would be surprised.

So this is standard English, but the Intel CPUs were made in the 70s, so it'd be the same as if you take

Shakespearean English, so you could say that it's still English, but mmm... You know, I don't know what that means actually...

or maybe I forgot, I quickly forgot at least, and it's a bit the same

for those opcodes which are still supported by all the CPUs that we have -- all the Intel CPUs -- but

we probably don't know what they actually do, and that's a problem.

I actually made, one of the proof of concepts that I made was only using these old opcodes, and these

old opcodes are actually doing something, so if someone is familiar with reading that, maybe I should

ask "how old are you?", because myself I am used to the PUSH/JUMP/CALLs, but when it's about this,

mmm... what is exactly being done. And it's still working on an i7, and it's still usable by malware,

packers or anything, and yet some of them are -- totally unused now and they are still fully working on

modern CPUs.

And of course, it's a bit like English, it's an evolving language, and a bit like maybe the oldest generations

of people -- of humans wouldn't be used to the buzzwords - the latest buzzwords.

These opcodes are sometimes present in the most recent CPUs, so, and you have direct opcodes for

CRC32 or AES decryption, string matching, and then some complex operation, in just one opcode.

So this, this is possible, this exists in modern CPUs. Not all of them, of course.

One thing that I like is the MOVBE -- move big endian -- opcode, because move big endian is the rejected

offspring, it's only implemented in the Atom CPU, which means this netbook has -- supports this opcode

and the i7 64-bit doesn't have this opcode, even though it will have CRC32 or maybe AES [op]code, so...

so much for complete backward compatibility.

There is no physical CPU as far as I know that can emulate -- execute CRC32 and MOVBE.

And of course, MOVBE is quite meaningless itself because you already have an opcode for the big --

endian-ness swapping. So I don't know, this small computer has an opcode that most PC's don't.

Okay. Why? I don't know. If you know...

[Audience member:] "Is this opcode documented in the CPU feature set?"

Yeah.

Yeah, it's totally -- this MOVBE -- it's totally documented, it's official.

[Audience member:] "But, no; is it like a CPU flag just for this instruction or is it implicit by 'this

is an Atom CPU'?"

Uh... Yeah, I don't know. I check the value by CPUID but I don't know if it's relevant to the... but

I think it's by itself. ...but the CPUID result is so big that I don't remember it all.

Uh, another thing, a bit specific to Windows in my case, because I focus on malware, is that before you do

actually any opcode, I was focusing on what are the register values when you start a program, and I found

out that the register values by default when you start a program and you haven't executed, theoretically, any opcode,

- theoretically- actually gives you some information that are actively used in malwares.

So for example, at the start point, EAX gives you either gives you if it's older generation (XP or before),

or Vista or later.

This is not so used by malwares, I don't recall seeing it, but GS, if GS is null, then it's a 32-bit

system, and if it's not it's a 64-bit system.

I will actually use that later in one of the tricks.

And also, the relations between the registers -- there are many registers on the Intel CPUs -- is not

sometimes very clear. I was surprised that when you do a FPU operation, it changes the FPU status, the

FPU registers themselves, but also the MMX registers, and somehow all the documentations I saw on the

internet are always mapping ST0 and MM0 in front of each other which makes sense, but actually if you

modify -- if you just do a single FPU operation, it will actually modify not MM0, but MM7.

So if you do an FPU operation like "load PI" [FLDPI] and then you check the value of MM7, that could be used

as a trick or it's just like the way it is.

And like, all the documentations, wikipedia and so on, that I could find about the overlapping of the registers.

Another thing is that this was used as an anti-emulation trick in XP, that FPU also changes CR0

so you have quite an unexpected anti-emulation trick by just using FPU operation.

So here is it; basically 'store machine status word' [SMSW] is an older 286 CPU opcode -- mnemonic, that was

created at the 286 era, so before the protected mode was fully created, and so it allows you to access

to read the value of CR0, even from user mode, while the 'MOV CR0' is actually a privileged opcode.

For some reason, the higher word of the register is undefined officially by the documentation, so Intel

just says "this is the value -- the lowest value is correct but you cannot expect the real value". So for

some reason, I don't know why they say that, because it's actually the value - the higher bits - of CR0.

And under XP, when you do FPU operations, the value of CR0 will be modified, and eventually reverts

by itself. So you can have, just by doing -- SMSW, and then you expect the result, then

you do a FPU operation, then the result should be different, and then eventually the result will revert

to the original value. So it's quite a tricky and unexpected anti-emulator.

You have a similar trick on 32-bit Windows, where GS is not stored in the context, so it means that on

thread-switch the value of GS is lost, which means if you just wait for something, GS will eventually

reset to 0. So if you set GS and you are stepping manually, this is slow and this creates a thread-switch,

so instantly GS is lost. And also, like the previous trick, if you just wait for GS not to be...

if you just loop until GS is not 0, this on a real system, will eventually exit from the loop.

But the first time, it blew me, I was really wondering what can happen there, there's no other thread

and of course in my proof of concept, it directly starts like this. What happens? What should happen now ,

but on a real system? Eventually, it's reset to 0.

Another thing is that of course it's reset to 0, but not in 0 time, so if you do wait for GS's reset

and then another loop, this can only happen between two resets... thread switch, which means it should

take a minimum of time, so you can use that for timing -- anti-emulation timing tricks.

Of course, I was also thinking that NOP is perfect, because NOP is NOP, it does nothing.

But originally NOP is 'exchange eax with eax' [xchg eax, eax], or 'ax with ax', but the problem is that NOP [encoded as] 0x90 is always doing nothing,

but on 64-bit you always have, you have another encoding [87 c0] to do an 'exchange EAX AX' which this time again

doesn't do anything on 32b, but like all the other opcodes

in 64b mode, it actually resets the higher DWORD

so you have an XCHG EAX [,EAX] that does something,

even though at first it looks like it would do nothing

but hopefully in this case the 90 NOP is still doing nothing

and this is probably now common in malwares and stuff

HINT NOP was the multi-byte nop

that actually gives a hint about what will be executed next, by the CPU

whatever the address here [in memory referenced HINT NOP]

it wouldn't trigger an exception

but as you can see, it's really a multi-byte opcode -- it can be a very long nop

that's weird to say

another thing is, once again it's partially undocumented by Intel

the full range of HINT NOP encoding is bigger on AMD documentation

and another thing is that, because it's a multi-byte opcode

if you - at the end of a page - insert those bytes

then it will look for the operands

then it could trigger an exception,

so it's a nop that could trigger an exception if at the end of the page

so, thank you Intel -- or whatever, I don't know, I'm not sure

MOV, once again, I thought...

MOV being MOV, should be perfectly logical

sadly not... first... all this is documented, but it's tricky

because -- there were even bugs for that in all the disassemblers I tried, I think

well, except Xed, maybe

you cannot do MOV on or from CR0 on memory

so the documentation says that the Mod/RM is ignored

it doesn't mean it's illegal, it's just ignored

so if you do this, which could lead to a crash

it's actually interpreted as that

and as far as I can remember, you'd fail all the disassemblers with that

until recently [ ;) ]

MOVSXD is a 64b opcode, is sign-extending, so theoretically

it should work from a smaller register to a bigger register

but if you use no REX prefix, which is discouraged

you can actually make it work like a standard MOV,

and the other way around,

MOV from a selector to a 32b register actually works

so many disassemblers were disassembling that as MOV AX, CS

because that would make both operands the same size,

but actually the upper word of the target register

is 'undefined' but actually there is no funny thing here,

there's no random value, it's zeroes

so basically, it makes it equivalent to MOV EAX, CS

BSWAP is one of my favorite

because I think it's like an administration

it's supposed to just swap the endianness of the registers

but because of -- external reasons

it's never really doing the work you expect

so, only in 64b, it's actually correctly swapping the endianness

as you would expect

on EAX [32b], in 64b [mode], like all the 32b opcodes,

it will actually register [clear] the higher dword -- ok !

and, on word, it's actually 'undefined' again

but it's commonly used in malwares and packers

because it just resets [the register]

so it's like a XOR AX, AX

so, with this unexplainable result, I understand

that Intel probably doesn't want to explain -- just say it's 'undefined'

because they would be too ashamed to explain

why we get this funny result

BSWAP AX is also wrongly disassembled by WinDbg and so on

it will be disassembled as BSWAP EAX

and actually, you clear the register

can everybody understand this code?

anybody sees the potential trap?

so, it pushes the address of  on the stack,

then RETN takes the address from the stack,

and, basically, you just jump to an immediate value,

execution ordering ?

yeah, the execution starts here

???

no -- ok, it's not the point here

and of course, if you -- this is OllyDbg 1, it's fixed in OllyDbg 2

but OllyDbg1 is even trying to be nice,

telling you -- this is an automatic comment -- that RET

is used as a jump to

and, as you can see, not exactly the same [happens]

so, what happened ?

no one sees ?

so, basically, here, you have a 66 prefix on RETN

which actually makes RETN to IP, and not EIP

so, actually, you don't jump to 401008, but to 00001008

and in this proof of concept, I mapped the NULL page

and I created -- added some code at this address

so, this is actually not a return to this []

but the problem is that, officially, this is also called a 'return'

it's not [different from the standard one] -- the disassemblers added their own, now, way of disassembling it

like 'small retn', ret.16, or something like this

but actually officially, it's the same mnemonic

so, the latest Hiew, I think, and that's OllyDbg 1

maybe the latest OllyDbg 2 fixed that

but you can still be tricked just by that

the 66 prefix - the jump to IP - also works on CALLs, RETs, LOOPs, [and JMPs]

so all the flow control opcodes

so, I won't enumerate all the tricks,

because otherwise you'll die of boredom probably

if you want more, then I created a page on Corkami [x86.corkami.com],

and I already made some graphs and cheat sheets

to have an easy [table] -- list of opcodes

and, that's quite too much theory for now...

So, I don't like just -- reading stuff and not having something to feed my debugger

so I created CoST

which stands for Corkami Standard Test

CoST is a single binary, there is no option,

you just run it, and it will just execute a lot of different tests

and then, I also made it a hardened PE,

so it may also help you to test the PE side of your tools

or your knowledge

but, because in hardened PE, it's actually quite difficult to debug,

I also made an easy PE mode so that

you can study only the assembly, and not have too much troubles

debugging it

so, CoST contains a lot of tests

classic stuff -- very trivial stuff

then, a few more complex stuff, like JMP to IP, IRET...

undocumented opcodes

CPU specific, like MOVBE, POPCNT, CRC32

also some detections of OS and VM by using common opcodes

like, the 'red pill trick'... yeah, just SLDT execution, and you get a value, and you compare...

but it's 'the blue pill', or whatever...

and also some OS bugs because sometimes, Windows XP

was doing the wrong job trying to tell you which was

the exception that just happened, and it would be a way

to make the difference between an actual OS and an emulator that would try to be logical

CoST is written in assembly, so, there's no extra

it's not compiled, it's not generated, but

to make it self-documented, I created internal exports

so that each section of the file is easy to browse [to],

so that you will know -- if you quickly want to jump to the 64b part

then it's easier via the exports

and also I wanted it to print messages in the most convenient way

so, if you keep printing messages, then it will make the assembly

wider, I mean longer to scroll, so I used

Vectored Exception Handling, and a fake opcode

so that you have the comments of what's gonna happen,

appearing directly in the code

so it's a kind of self-documented, without a debug symbols file

and, you saw, it doesn't have much of output

but actually it has a lot of debug output

like 100 -- I forgot -- messages. it's even saying '[trick] I'm gonna do this'

and then, 'i'm gonna do that...', so

trying to make it helpful yet a bit hard to disassemble

can anyone understand what this code is doing ?

this is one of my favourite

we can't see the opcodes

no, there's no [opcode] trick this time

so, basically you push some arguments on the stack

you jump to here

basically, with the return far [RETF]... I pushed 'push_eip' on the stack

with a 33 word

so basically I will RETurn Far to this

basically I will return back to this EIP in selector 33

if this is in a 64b OS, and this is a 32b process

you will return back to execution here, in 64b mode

because selector 33 is the selector for 64b mode

which you can access from a 32b process

so basically this code will be executed first in the current selector

as you see, and then it's executed back on selector 33,

which means in 64b mode

so you have the same EIP, you have the same opcodes

but the disassembly will be different,

and I chose some opcodes will make mnemonics

specific to each side, 32b or 64b sides

so, it's already quite a b*tch to disassemble

because, same EIP, so unless you're careful about the selector,

well, it's a problem

[Errata: you can debug this kind of code, check my berlinsides presentation (screencast on slide 58)]

http://bsx2.corkami.com , slide 58 [screencast]

if you run over it, you return to the original selector,

which is why there is the PUSH CS here

and you go back to with the original selector

execution will go through quickly

but you cannot step through that code [WRONG, you can with WinDbg+wow64exts]

so, killing the disassemblers, and the debuggers

and yet, simple

so, here is the result that you get when you run CoST

with the latest -- well the latest public version of Hiew

I think it's gonna be fixed

so, this is a HINT NOP that's not documented by Intel

and it's a bit forgotten by most disassemblers

so, WinDbg and Hiew are giving you

undocumented, well -- questions marks, or the Hiew style of question marks

then, since -- that was originally what I planned to present at Hashdays

but then, I decided to bring a few tricks in CoST itself, on the PE side of things

so, this is the header, so it has MZ, and then some text

so you can 'type cost.exe'

and it has some text - I made it type-able

and the NT headers - the 'PE' header, the one starting with PE

is actually starting at the bottom of the file -- the bottom of the file is here

so it's a footer

and I made it so the values are quite critical

so, they are not the one you would expect

so this is the result that you would get when you were

loading CoST under IDA 6.1

so, well, some values were random and everything

but, if you have -- with CoST, you can test and set the value of a register

then compare it

but you cannot test all the possibilities of PE files

with a single file, because you have to choose

so, for example, CoST has no section, weird alignments and everything

but you cannot make all the possible cases [in a single file]

so, I went on and I created another page on Corkami

with, as usual, the proof of concepts, some graphs about the PE files and everything

I don't consider it finished but I consider it good enough to break

a bit everything

now, I already created more than 100 PoCs, which try

0 section, big alignments, huge alignments, and I have some funny results...

so, here is the 'virtual section table vs Hiew'

so, when you're in low alignments, you can have no section,

or the section table can be empty

so basically, I made the SizeOfOptionalHeader point in virtual memory space

which means the section table is out of the PE file [full of 00, in virtual space]

and Hiew doesn't like this. A consequence of that it doesn't even think it's a PE file

while it's fully working, but this trick only works under XP

because Windows 7 is a bit more picky on the unused section table values

so when you got some ASCII art in the Data Directories

you can probably guess that there is something going on

if you have better ASCII art suggestion, I'm all ears

so, basically, this is the 'Dual PE header' that was presented by

Reversing Labs in BlackHat

so, are you familiar with that ?

so, basically, you extend the SizeOfHeaders so that

the NT headers will be actually mapped at the bottom of the file

so that when it's far enough to reach section [not file] alignment

and when you load that, in memory

the first section will actually be mapped over it

the first part of the OPTIONAL_HEADER is the one used on disk

so, this is what is used to check if the file will load

but the Data Directories are read from the values in memory

so, first, the OPTIONAL_HEADER is parsed, mapped in memory

then the section is folding itself over the bottom part of the header

and then the true Data directories that were originally

in the start of the section will be taken in account

so all this is garbage and visible on disk, it follows the SizeOfOptionalHeader

but actually in memory, this is not what is used to be parsed

another weird thing is that the export names can just be

absolutely anything, until a null character

which means, non ASCII, whatever

and another funny thing is that

Hiew displays them in line

so you can just add your own ads,

because those are just export names, and one of the export

[name] is actually more than 16 Kb

so that it's good enough to create a buffer overflow

if your tool is not careful about that

and it's also possible to have a NULL export [name], just a character NULL

and you can import a NULL API

no problem

I also just tried to see the different possibilities

created a few files that had the maximum number of sections

the limit is 96 under XP, and 64K under Vista and [Windows] 7

which means, well

OllyDbg 2 - the latest OllyDbg - gives you a funny message

but it still loads the file.

OllyDbg 1 crashes directly on this file

err...still some time ?

and the one last, not very visual, but I noticed

that the AddressOfIndex of the TLS is overwritten on loading

and imports - the terminator of imports doesn't need to be five null dwords

but only if the name [of the DLL] is 0, then the import descriptor

is considered a terminator

so, basically, if you make AddressOfIndex point to the name of an import descriptor

you could get that overwritten, and then the imports will be truncated

will be considered truncated

and actually, the behavior is different under XP or Windows 7

so, under XP, it's overwritten after imports loading,

so the whole imports table is not truncated,

while under Windows 7, it's happening before the imports are loaded,

which means you have the same PE, but different loading behaviour

under different versions of windows

and the file works on both versions of windows

oh wait, before that... maybe I still have some time ?

15 minutes left ? ok

I'll do the demo

This is just to prove...

sorry?

This is the kind of PE file that I typically create

I only defined [required] elements that just need to work

and this is actually a driver

so, even though I used some undocumented opcodes

It's a working driver and it doesn't have the usual

[compiler] stuff you have in a driver

just to say that this is the kind of PoC, clear to see

you don't have external stuff that bother, that bugs your view

or your debugging

so, this one is just to see the possible values of CR0

via the SMSW, theoretically undefined on DWORD

but it actually gives you the same value

[like] the standard MOV EAX, CR0

and here is MOV EAX, CR0 with the wrong Mod/RM

which, in the latest Hiew, is actually not disassembled at all

let's hope it doesn't crash...

so, as you can see, you get exactly the same value

whether you're using the normal CR0, the 'invalid' one, and the 'undefined'

the upper part is supposed to be undefined

usually when it's undefined, it's zeroes, in Intel language

but here it just works fine

and my machine didn't even crash

which means the driver is fine

so you can study small drivers

the first PoC that I presented here

was the one with old disassembly

anyone still knows what the value is?

so basically, some opcodes are here for garbage

just to prove that they are actually [supported], they are just used as junk

but registers are actually modified [in the others]

and these opcodes from the 70's, or something -- the early 80's

are still perfectly working on a modern CPU or even an i7

one of the PoC I created is the one that actually tests the values

-- the initial values [of each registers] -- so that you can see

what would be the possible values whether it's on XP or Windows 7

each time [TLS, EntryPoint, DllMain], I just save all the values of the registers

and then I compare them to possible values

so I test them one after each other

actually, on TLS, you have much more control of the values

because the values you will get in the TLS -- on loading the TLS

are the RVA [of the TLS data directory], the callbacks, the size of the TLS

you get that in -- I forgot exactly, but it's in the source...

running this will help you to mimic an OS better in your emulator

if that's what you're interested [in]

SMSW is actually the one comparing -- so, using SMSW,

then comparing the value, then checking whether the register changed

[after an FPU operation] and then when it reverts normally

a funny fact that I would like an explanation [for],

if you know it

is that actually, this behaviour is different if you run the file normally

or if you run it with a redirection

if you pipe the output, you get a 'fail' result

if you run the file normally, it just works

so, I would like -- here, I will just run it, and then I will run it to a file, and just TYPE the result

normal execution: OK

redirection: FAIL

if you guys have any explanation for that, I'm all ears

did you try redirecting to something else ? like, a COM

oh, I didn't try

so, you would pipe to another device, and ...

but then, how do you get it back ?

printer, or ...

yeah, I don't have a COM device or...

yeah, I don't know

but it was a big surprise, because I had a test bench

and then, 'FAIL'. .. uh ?

run, OK... so, I have no idea why...

the GS trick...

quite simple

and I also have some output

I modified GS then it's reset

then it's waited for result

then I'm doing 2 resets and checking the time in between

so that, it shouldn't happen too quickly

NOPs, so...

I'm testing the undocumented NOPs

testing the NOP that are on invalid page

so, standard NOP

32b nop

so, all my 64b tests are still done in 32b process so that you can run them on normal OS

then it detects via GS if 64b [mode] is available

and in this case, you would get a different result

so, if you run it on 64b, which I don't have here, you would get

the actual tests on 64b

and the results printed out.

but still, it's not possible to debug that easily [wrong]

but at least, there's no trick over there, so it's easy to bring back to a 64b process

[to step over 64b code and return to the 32b process]

PUSH/RET

you print the output, and then...

Olly nicely tells you that you will jump to 401008

but actually -- here the display is actually correct

and the TLS already created a null page

which prints 'FAIL'

so, as expected, but there is no standard way to disassemble that correctly

I can't execute the working 64k sections.

and actually I'm executing all the code [the complete virtual space of all 64k sections]

the sections are quite big

and I'm modifying EAX so that all the 00 00 are executed

and just to do a printf in the end.

it actually takes a few seconds to execute on an i7

so it's actually quite funny to see... you launch it... even when the cache is loaded,

and the OS is ready to be fast... you launch it... and printf comes a few seconds later

virtual sections is the one that Hiew doesn't think it's a PE at all -- this is the latest Hiew

well, it's been patched anyway

well, I can't browse PE now that it doesn't think it's a PE file...

but basically, it thinks that the OPTIONAL_HEADER points to the end of the file -- beyond the end of

the file

the folded header...

a few error messages...

because of the wrong data directories

and the actual DD are at the start of...

...the section

this would be the imports and the actual real DD

and last, the one with the TLS AddressOfIndex that is pointing...

...inside the imports, at the AddressOfName

so it will overwrite the loading [overwrite the pointer during loading]

and when you just load it, it just says 'it's XP' because

my imports were loaded this way, and not the other way.

and if you run that file [under W7], it will give you another results

and then, the exports...

where some of the exports are actually very long

you can see that actually, here I'm taking over the disassembly

so I'm repeating the same fake opcodes and address

so you fool the disassembler that way

I think it's just a visual effect, they are no big problems

but it's a known problem that was fixed recently in IDA

that if you put an export in the middle of the instruction

the fake export will actually take over the disassembly,

and that would ruin the disassembly

there's actually a PoC for that in Corkami, of course

so, that's all for the demos

so, I wanted to know more about x86 and PE

which are far from perfectly documented

and are still not perfectly documented,

but at least, I've been covering some parts of it,

there are still some gray areas,

but at least, every day, I'm just learning a bit more,

and publishing my results and sharing them openly,

like WinDbg, if you follow only the official documentations,

you will only get bad results, with malwares and packers out there,

if you - yourself - are interested, or you develop a tool, an emulator, an engine, whatever...

well you know you can just visit Corkami, read the pages,

download the PoCs, which are [freely] available,

and if you find any bugs - which might happen,

then send me a postcard, or a red-cross T-shirt

Thanks to Peter Ferrie, and all my reviewers, and people who contributed...

do you have any questions ?

did you ran them through AVs - antivirus scanners? you would find a sh*tload of 0days

no, then, I wouldn't be good to actually turn them into exploits or anything, so...

already breaking all the disassemblers and stuff was good enough for me

I found a crash in Intel XED, which was good enough

any other question? everybody survived the presentation?

it's a great talk, man

thank you!

THANK YOU! [for watching]