Such a weird processor - messing with x86 opcodes... and a little bit of PE [Portable Executable] So welcome. ...And especially let me know if I speak too quickly. Um, so -- who I am -- oh, yes so I will talk about opcodes and a little bit about the PE [portable executable] file format and their oddities. So, I've been a reverse engineer for some years, for some time. I created a project called Corkami. Also in the past I worked on the MAME arcade emulator, and professionally I am a malware analyst, but this is only on the behalf of my hobbies, this is my own experiments and research at home. So, I introduced Corkami. Corkami is just the name of the project I created for RCE project. I tried to keep it just to the technical stuff, no ads, no login required. Really direct to the good stuff. I try to update it and make it useful, so I also created cheat sheets and the kind of easy documents that I would use for work on a daily basis, but it's only a hobby; I do that once the kids are asleep and late at night so it's probably doesn't look professional and as good as I would like it to be. So right now, Corkami, the form of Corkami, is wiki pages and cheat sheets and I focus on creating as many as possible relevant proof of concepts [Hi Bob!] so the binaries are hand-written, usually I don't use a compiler, I create the PE (structure) myself so that it's only focusing on the exact interesting point and you don't have a lot of noise even -- you don't probably need IDA to actually understand what's going on because I try to focus only on what's important. The binaries are all directly available to download so you can really test your debugger, your tools, your knowledge and just get them directly from that. So far, I've focused on the PDF, assembly and the PE.. ...file format. A few other stuff, but that's mainly the most covered subject of my website. And I share that with a very permissive license so BSD you can reuse them commercially whatever. Even the images are done in open-source format. So the story behind this presentation is that some time ago I was young and innocent and I thought that CPUs, being electronic - whatever - they had to be perfectly logical and no problems and then I was tricked by malware. And basically IDA wasn't able to work on it, so I decided to go back to the basics and study assembly and PE files from scratch. I created in the meantime documents on Corkami and now I'm presenting you more or less the final results. or the good programs results. If I wasn't -- if I was just a guy who learned assembly I probably wouldn't be in HashDays to talk about it, if I didn't get a few achievements from various tools. So basically I failed all the disassemblers that I tried and I also created a few crashes - in IDA. I insist that all the authors were notified and most of the bugs are already fixed, but basically it was like this in 6.1 -- you get a direct crash -- but now it's fixed in 6.2, and everything. And Hiew [Hacker's view] - that's the latest version - but the newest and released, - well, the newest beta - fixed that and so on. So the agenda for the presentation is that I first try with an easy introduction, but I assume that most of you already know or are familiar with disassembly, right? Yes. And another question: are you all familiar with or you already had an event of undocumented disassembly in your ... or never? Like, you trust IDA and that's all. Like, is it a common thing to have an undocumented disassembly in IDA? Raise you arms -- okay, not so much. Okay. So then after the introduction (that will go quickly), I will mention a few tricks, then introduce CoST, the program that I created. And I will also talk a little bit more about the PE file format. So as you all have assembly knowledge I will go quickly on that. So basically, you compile a binary, there is assembly, there is some relevance, some common points between the [source] code and the assembled [generated] code. Then of course there is a relation between the opcode and the [assembly] code, you all know that. What is important is that the assembly is generated by the compiler, but actually what is then from the assembly what is -- what's only kept in the binary are the opcodes itself which are understood directly by the CPU, which means the CPU just knows what to do with the bytes, it doesn't care if you or the tool you're using know what it will do, because it just does it. And the problem is that what we read is not usually the opcodes for most people but actually the disassembly and if the disassembler doesn't give you any result, well, we're stuck, we're blind, we don't know what execution will do. And the other problem is because of the opcode length you don't know what the next instruction will be because you don't know how to disassemble it. So, here I just create one undocumented opcode in a simple program. So basically we just '_emit' -- [it's] a keyword in -- that's Visual Studio 2010 ultimate -- you will get a byte that is unidentified at disassembly so you get question marks, so basically this program even though it costs several thousand dollars is not able to -- it doesn't know what will happen. So usually if you do that... Oh, yeah, if you check the Intel documentation there is nothing to see at the D6 opcode, there is nothing to see there. Microsoft doesn't say anything, Intel doesn't say anything, so usually if you try that you could expect bad results. So, not documented, directly: usually it is a crash or not the expected result. But here, in this case, this specific case, no problem. We don't know what is was, if we follow Intel or Microsoft documentation, we don't know what happened. But if we -- the CPU just does its stuff. So what happened is that actually D6 is a very simple opcode, that doesn't do much, but somehow it's not documented by Intel [but] it's documented by AMD, and most of the opcodes are actually documented by AMD but not Intel. I don't know why, if anyone has any idea why... It's quite a trivial opcode, but it's not -- Intel still says there's nothing there. Okay. So it's commonly used, the common use for those undocumented opcodes are malware and packers, just to prevent automated analysis or easy reverse-engineering. What's funny is that, Intel, if you follow the documentation you will have many holes, but Intel's own disassembler, Xed, which is free of use, it is not open source, but just handles all these opcodes correctly, while Microsoft, and Visual Studio, and WinDBG, they follow blindly the documentation. So you will get question marks even though Intel knows perfectly what it does. So it's like "[...] do as I disassemble and don't read my documentation." So - of course - you could argue that WinDBG is only made to debug what the compiler, Microsoft compiler created, but then it kind of rules out WinDBG as a malware debugging tool, because you just inserted D6, it's trivial, and WinDBG is just not able to tell you what the instructions are. So it's not very useful for malware analysis -- for a malware analysis debugger So, another problem that happens is that of course each of the undocumented things, facts, are available, maybe one you will have in a trojan, one in a packer, and everything, but it's not so easy to find a good, exhaustive, clean test set to actually gather all these undocumented facts, so for example if you so, for example, someone says - a colleague - mentions an undocumented opcode or behaviour, and then you say "oh yeah, it's in MebRoot [MBR infector], or you skip this part of the file or whatever", and then you are actually, you know first it's a malware so you have -- you cannot really spread that, and then there is a lot of noise -- the malware payload or something before and after -- so it's not so easy to analyse. So that's why I focused on creating a small and clean test set that would actually provide --- insists just on one particular instruction or fact. So, now let's start, at last, the real stuff, and a few of the undocumented opcodes. But before I actually started [studying], [I was] wondering what the actual possibilities of the CPUs, I didn't even know what are the possibilities, what are the opcodes that are still supported or not by the -- by the CPU. And I think it's a bit like English, everybody, or most people in the world, would be able to read and understand these words, and if you['ve] see[n] some disassembly [before] then well you are used to seeing these opcodes, they are made by all the compilers and they are so common that if they are not here then we are a bit ill-at-ease, and if it's something different then we probably would be surprised. So this is standard English, but the Intel CPUs were made in the 70s, so it'd be the same as if you take Shakespearean English, so you could say that it's still English, but mmm... You know, I don't know what that means actually... or maybe I forgot, I quickly forgot at least, and it's a bit the same for those opcodes which are still supported by all the CPUs that we have -- all the Intel CPUs -- but we probably don't know what they actually do, and that's a problem. I actually made, one of the proof of concepts that I made was only using these old opcodes, and these old opcodes are actually doing something, so if someone is familiar with reading that, maybe I should ask "how old are you?", because myself I am used to the PUSH/JUMP/CALLs, but when it's about this, mmm... what is exactly being done. And it's still working on an i7, and it's still usable by malware, packers or anything, and yet some of them are -- totally unused now and they are still fully working on modern CPUs. And of course, it's a bit like English, it's an evolving language, and a bit like maybe the oldest generations of people -- of humans wouldn't be used to the buzzwords - the latest buzzwords. These opcodes are sometimes present in the most recent CPUs, so, and you have direct opcodes for CRC32 or AES decryption, string matching, and then some complex operation, in just one opcode. So this, this is possible, this exists in modern CPUs. Not all of them, of course. One thing that I like is the MOVBE -- move big endian -- opcode, because move big endian is the rejected offspring, it's only implemented in the Atom CPU, which means this netbook has -- supports this opcode and the i7 64-bit doesn't have this opcode, even though it will have CRC32 or maybe AES [op]code, so... so much for complete backward compatibility. There is no physical CPU as far as I know that can emulate -- execute CRC32 and MOVBE. And of course, MOVBE is quite meaningless itself because you already have an opcode for the big -- endian-ness swapping. So I don't know, this small computer has an opcode that most PC's don't. Okay. Why? I don't know. If you know... [Audience member:] "Is this opcode documented in the CPU feature set?" Yeah. Yeah, it's totally -- this MOVBE -- it's totally documented, it's official. [Audience member:] "But, no; is it like a CPU flag just for this instruction or is it implicit by 'this is an Atom CPU'?" Uh... Yeah, I don't know. I check the value by CPUID but I don't know if it's relevant to the... but I think it's by itself. ...but the CPUID result is so big that I don't remember it all. Uh, another thing, a bit specific to Windows in my case, because I focus on malware, is that before you do actually any opcode, I was focusing on what are the register values when you start a program, and I found out that the register values by default when you start a program and you haven't executed, theoretically, any opcode, - theoretically- actually gives you some information that are actively used in malwares. So for example, at the start point, EAX gives you either gives you if it's older generation (XP or before), or Vista or later. This is not so used by malwares, I don't recall seeing it, but GS, if GS is null, then it's a 32-bit system, and if it's not it's a 64-bit system. I will actually use that later in one of the tricks. And also, the relations between the registers -- there are many registers on the Intel CPUs -- is not sometimes very clear. I was surprised that when you do a FPU operation, it changes the FPU status, the FPU registers themselves, but also the MMX registers, and somehow all the documentations I saw on the internet are always mapping ST0 and MM0 in front of each other which makes sense, but actually if you modify -- if you just do a single FPU operation, it will actually modify not MM0, but MM7. So if you do an FPU operation like "load PI" [FLDPI] and then you check the value of MM7, that could be used as a trick or it's just like the way it is. And like, all the documentations, wikipedia and so on, that I could find about the overlapping of the registers. Another thing is that this was used as an anti-emulation trick in XP, that FPU also changes CR0 so you have quite an unexpected anti-emulation trick by just using FPU operation. So here is it; basically 'store machine status word' [SMSW] is an older 286 CPU opcode -- mnemonic, that was created at the 286 era, so before the protected mode was fully created, and so it allows you to access to read the value of CR0, even from user mode, while the 'MOV CR0' is actually a privileged opcode. For some reason, the higher word of the register is undefined officially by the documentation, so Intel just says "this is the value -- the lowest value is correct but you cannot expect the real value". So for some reason, I don't know why they say that, because it's actually the value - the higher bits - of CR0. And under XP, when you do FPU operations, the value of CR0 will be modified, and eventually reverts by itself. So you can have, just by doing -- SMSW, and then you expect the result, then you do a FPU operation, then the result should be different, and then eventually the result will revert to the original value. So it's quite a tricky and unexpected anti-emulator. You have a similar trick on 32-bit Windows, where GS is not stored in the context, so it means that on thread-switch the value of GS is lost, which means if you just wait for something, GS will eventually reset to 0. So if you set GS and you are stepping manually, this is slow and this creates a thread-switch, so instantly GS is lost. And also, like the previous trick, if you just wait for GS not to be... if you just loop until GS is not 0, this on a real system, will eventually exit from the loop. But the first time, it blew me, I was really wondering what can happen there, there's no other thread and of course in my proof of concept, it directly starts like this. What happens? What should happen now , but on a real system? Eventually, it's reset to 0. Another thing is that of course it's reset to 0, but not in 0 time, so if you do wait for GS's reset and then another loop, this can only happen between two resets... thread switch, which means it should take a minimum of time, so you can use that for timing -- anti-emulation timing tricks. Of course, I was also thinking that NOP is perfect, because NOP is NOP, it does nothing. But originally NOP is 'exchange eax with eax' [xchg eax, eax], or 'ax with ax', but the problem is that NOP [encoded as] 0x90 is always doing nothing, but on 64-bit you always have, you have another encoding [87 c0] to do an 'exchange EAX AX' which this time again doesn't do anything on 32b, but like all the other opcodes in 64b mode, it actually resets the higher DWORD so you have an XCHG EAX [,EAX] that does something, even though at first it looks like it would do nothing but hopefully in this case the 90 NOP is still doing nothing and this is probably now common in malwares and stuff HINT NOP was the multi-byte nop that actually gives a hint about what will be executed next, by the CPU whatever the address here [in memory referenced HINT NOP] it wouldn't trigger an exception but as you can see, it's really a multi-byte opcode -- it can be a very long nop that's weird to say another thing is, once again it's partially undocumented by Intel the full range of HINT NOP encoding is bigger on AMD documentation and another thing is that, because it's a multi-byte opcode if you - at the end of a page - insert those bytes then it will look for the operands then it could trigger an exception, so it's a nop that could trigger an exception if at the end of the page so, thank you Intel -- or whatever, I don't know, I'm not sure MOV, once again, I thought... MOV being MOV, should be perfectly logical sadly not... first... all this is documented, but it's tricky because -- there were even bugs for that in all the disassemblers I tried, I think well, except Xed, maybe you cannot do MOV on or from CR0 on memory so the documentation says that the Mod/RM is ignored it doesn't mean it's illegal, it's just ignored so if you do this, which could lead to a crash it's actually interpreted as that and as far as I can remember, you'd fail all the disassemblers with that until recently [ ;) ] MOVSXD is a 64b opcode, is sign-extending, so theoretically it should work from a smaller register to a bigger register but if you use no REX prefix, which is discouraged you can actually make it work like a standard MOV, and the other way around, MOV from a selector to a 32b register actually works so many disassemblers were disassembling that as MOV AX, CS because that would make both operands the same size, but actually the upper word of the target register is 'undefined' but actually there is no funny thing here, there's no random value, it's zeroes so basically, it makes it equivalent to MOV EAX, CS BSWAP is one of my favorite because I think it's like an administration it's supposed to just swap the endianness of the registers but because of -- external reasons it's never really doing the work you expect so, only in 64b, it's actually correctly swapping the endianness as you would expect on EAX [32b], in 64b [mode], like all the 32b opcodes, it will actually register [clear] the higher dword -- ok ! and, on word, it's actually 'undefined' again but it's commonly used in malwares and packers because it just resets [the register] so it's like a XOR AX, AX so, with this unexplainable result, I understand that Intel probably doesn't want to explain -- just say it's 'undefined' because they would be too ashamed to explain why we get this funny result BSWAP AX is also wrongly disassembled by WinDbg and so on it will be disassembled as BSWAP EAX and actually, you clear the register can everybody understand this code? anybody sees the potential trap? so, it pushes the address of on the stack, then RETN takes the address from the stack, and, basically, you just jump to an immediate value, execution ordering ? yeah, the execution starts here ??? no -- ok, it's not the point here and of course, if you -- this is OllyDbg 1, it's fixed in OllyDbg 2 but OllyDbg1 is even trying to be nice, telling you -- this is an automatic comment -- that RET is used as a jump to and, as you can see, not exactly the same [happens] so, what happened ? no one sees ? so, basically, here, you have a 66 prefix on RETN which actually makes RETN to IP, and not EIP so, actually, you don't jump to 401008, but to 00001008 and in this proof of concept, I mapped the NULL page and I created -- added some code at this address so, this is actually not a return to this [] but the problem is that, officially, this is also called a 'return' it's not [different from the standard one] -- the disassemblers added their own, now, way of disassembling it like 'small retn', ret.16, or something like this but actually officially, it's the same mnemonic so, the latest Hiew, I think, and that's OllyDbg 1 maybe the latest OllyDbg 2 fixed that but you can still be tricked just by that the 66 prefix - the jump to IP - also works on CALLs, RETs, LOOPs, [and JMPs] so all the flow control opcodes so, I won't enumerate all the tricks, because otherwise you'll die of boredom probably if you want more, then I created a page on Corkami [x86.corkami.com], and I already made some graphs and cheat sheets to have an easy [table] -- list of opcodes and, that's quite too much theory for now... So, I don't like just -- reading stuff and not having something to feed my debugger so I created CoST which stands for Corkami Standard Test CoST is a single binary, there is no option, you just run it, and it will just execute a lot of different tests and then, I also made it a hardened PE, so it may also help you to test the PE side of your tools or your knowledge but, because in hardened PE, it's actually quite difficult to debug, I also made an easy PE mode so that you can study only the assembly, and not have too much troubles debugging it so, CoST contains a lot of tests classic stuff -- very trivial stuff then, a few more complex stuff, like JMP to IP, IRET... undocumented opcodes CPU specific, like MOVBE, POPCNT, CRC32 also some detections of OS and VM by using common opcodes like, the 'red pill trick'... yeah, just SLDT execution, and you get a value, and you compare... but it's 'the blue pill', or whatever... and also some OS bugs because sometimes, Windows XP was doing the wrong job trying to tell you which was the exception that just happened, and it would be a way to make the difference between an actual OS and an emulator that would try to be logical CoST is written in assembly, so, there's no extra it's not compiled, it's not generated, but to make it self-documented, I created internal exports so that each section of the file is easy to browse [to], so that you will know -- if you quickly want to jump to the 64b part then it's easier via the exports and also I wanted it to print messages in the most convenient way so, if you keep printing messages, then it will make the assembly wider, I mean longer to scroll, so I used Vectored Exception Handling, and a fake opcode so that you have the comments of what's gonna happen, appearing directly in the code so it's a kind of self-documented, without a debug symbols file and, you saw, it doesn't have much of output but actually it has a lot of debug output like 100 -- I forgot -- messages. it's even saying '[trick] I'm gonna do this' and then, 'i'm gonna do that...', so trying to make it helpful yet a bit hard to disassemble can anyone understand what this code is doing ? this is one of my favourite we can't see the opcodes no, there's no [opcode] trick this time so, basically you push some arguments on the stack you jump to here basically, with the return far [RETF]... I pushed 'push_eip' on the stack with a 33 word so basically I will RETurn Far to this basically I will return back to this EIP in selector 33 if this is in a 64b OS, and this is a 32b process you will return back to execution here, in 64b mode because selector 33 is the selector for 64b mode which you can access from a 32b process so basically this code will be executed first in the current selector as you see, and then it's executed back on selector 33, which means in 64b mode so you have the same EIP, you have the same opcodes but the disassembly will be different, and I chose some opcodes will make mnemonics specific to each side, 32b or 64b sides so, it's already quite a b*tch to disassemble because, same EIP, so unless you're careful about the selector, well, it's a problem [Errata: you can debug this kind of code, check my berlinsides presentation (screencast on slide 58)] http://bsx2.corkami.com , slide 58 [screencast] if you run over it, you return to the original selector, which is why there is the PUSH CS here and you go back to with the original selector execution will go through quickly but you cannot step through that code [WRONG, you can with WinDbg+wow64exts] so, killing the disassemblers, and the debuggers and yet, simple so, here is the result that you get when you run CoST with the latest -- well the latest public version of Hiew I think it's gonna be fixed so, this is a HINT NOP that's not documented by Intel and it's a bit forgotten by most disassemblers so, WinDbg and Hiew are giving you undocumented, well -- questions marks, or the Hiew style of question marks then, since -- that was originally what I planned to present at Hashdays but then, I decided to bring a few tricks in CoST itself, on the PE side of things so, this is the header, so it has MZ, and then some text so you can 'type cost.exe' and it has some text - I made it type-able and the NT headers - the 'PE' header, the one starting with PE is actually starting at the bottom of the file -- the bottom of the file is here so it's a footer and I made it so the values are quite critical so, they are not the one you would expect so this is the result that you would get when you were loading CoST under IDA 6.1 so, well, some values were random and everything but, if you have -- with CoST, you can test and set the value of a register then compare it but you cannot test all the possibilities of PE files with a single file, because you have to choose so, for example, CoST has no section, weird alignments and everything but you cannot make all the possible cases [in a single file] so, I went on and I created another page on Corkami with, as usual, the proof of concepts, some graphs about the PE files and everything I don't consider it finished but I consider it good enough to break a bit everything now, I already created more than 100 PoCs, which try 0 section, big alignments, huge alignments, and I have some funny results... so, here is the 'virtual section table vs Hiew' so, when you're in low alignments, you can have no section, or the section table can be empty so basically, I made the SizeOfOptionalHeader point in virtual memory space which means the section table is out of the PE file [full of 00, in virtual space] and Hiew doesn't like this. A consequence of that it doesn't even think it's a PE file while it's fully working, but this trick only works under XP because Windows 7 is a bit more picky on the unused section table values so when you got some ASCII art in the Data Directories you can probably guess that there is something going on if you have better ASCII art suggestion, I'm all ears so, basically, this is the 'Dual PE header' that was presented by Reversing Labs in BlackHat so, are you familiar with that ? so, basically, you extend the SizeOfHeaders so that the NT headers will be actually mapped at the bottom of the file so that when it's far enough to reach section [not file] alignment and when you load that, in memory the first section will actually be mapped over it the first part of the OPTIONAL_HEADER is the one used on disk so, this is what is used to check if the file will load but the Data Directories are read from the values in memory so, first, the OPTIONAL_HEADER is parsed, mapped in memory then the section is folding itself over the bottom part of the header and then the true Data directories that were originally in the start of the section will be taken in account so all this is garbage and visible on disk, it follows the SizeOfOptionalHeader but actually in memory, this is not what is used to be parsed another weird thing is that the export names can just be absolutely anything, until a null character which means, non ASCII, whatever and another funny thing is that Hiew displays them in line so you can just add your own ads, because those are just export names, and one of the export [name] is actually more than 16 Kb so that it's good enough to create a buffer overflow if your tool is not careful about that and it's also possible to have a NULL export [name], just a character NULL and you can import a NULL API no problem I also just tried to see the different possibilities created a few files that had the maximum number of sections the limit is 96 under XP, and 64K under Vista and [Windows] 7 which means, well OllyDbg 2 - the latest OllyDbg - gives you a funny message but it still loads the file. OllyDbg 1 crashes directly on this file err...still some time ? and the one last, not very visual, but I noticed that the AddressOfIndex of the TLS is overwritten on loading and imports - the terminator of imports doesn't need to be five null dwords but only if the name [of the DLL] is 0, then the import descriptor is considered a terminator so, basically, if you make AddressOfIndex point to the name of an import descriptor you could get that overwritten, and then the imports will be truncated will be considered truncated and actually, the behavior is different under XP or Windows 7 so, under XP, it's overwritten after imports loading, so the whole imports table is not truncated, while under Windows 7, it's happening before the imports are loaded, which means you have the same PE, but different loading behaviour under different versions of windows and the file works on both versions of windows oh wait, before that... maybe I still have some time ? 15 minutes left ? ok I'll do the demo This is just to prove... sorry? This is the kind of PE file that I typically create I only defined [required] elements that just need to work and this is actually a driver so, even though I used some undocumented opcodes It's a working driver and it doesn't have the usual [compiler] stuff you have in a driver just to say that this is the kind of PoC, clear to see you don't have external stuff that bother, that bugs your view or your debugging so, this one is just to see the possible values of CR0 via the SMSW, theoretically undefined on DWORD but it actually gives you the same value [like] the standard MOV EAX, CR0 and here is MOV EAX, CR0 with the wrong Mod/RM which, in the latest Hiew, is actually not disassembled at all let's hope it doesn't crash... so, as you can see, you get exactly the same value whether you're using the normal CR0, the 'invalid' one, and the 'undefined' the upper part is supposed to be undefined usually when it's undefined, it's zeroes, in Intel language but here it just works fine and my machine didn't even crash which means the driver is fine so you can study small drivers the first PoC that I presented here was the one with old disassembly anyone still knows what the value is? so basically, some opcodes are here for garbage just to prove that they are actually [supported], they are just used as junk but registers are actually modified [in the others] and these opcodes from the 70's, or something -- the early 80's are still perfectly working on a modern CPU or even an i7 one of the PoC I created is the one that actually tests the values -- the initial values [of each registers] -- so that you can see what would be the possible values whether it's on XP or Windows 7 each time [TLS, EntryPoint, DllMain], I just save all the values of the registers and then I compare them to possible values so I test them one after each other actually, on TLS, you have much more control of the values because the values you will get in the TLS -- on loading the TLS are the RVA [of the TLS data directory], the callbacks, the size of the TLS you get that in -- I forgot exactly, but it's in the source... running this will help you to mimic an OS better in your emulator if that's what you're interested [in] SMSW is actually the one comparing -- so, using SMSW, then comparing the value, then checking whether the register changed [after an FPU operation] and then when it reverts normally a funny fact that I would like an explanation [for], if you know it is that actually, this behaviour is different if you run the file normally or if you run it with a redirection if you pipe the output, you get a 'fail' result if you run the file normally, it just works so, I would like -- here, I will just run it, and then I will run it to a file, and just TYPE the result normal execution: OK redirection: FAIL if you guys have any explanation for that, I'm all ears did you try redirecting to something else ? like, a COM oh, I didn't try so, you would pipe to another device, and ... but then, how do you get it back ? printer, or ... yeah, I don't have a COM device or... yeah, I don't know but it was a big surprise, because I had a test bench and then, 'FAIL'. .. uh ? run, OK... so, I have no idea why... the GS trick... quite simple and I also have some output I modified GS then it's reset then it's waited for result then I'm doing 2 resets and checking the time in between so that, it shouldn't happen too quickly NOPs, so... I'm testing the undocumented NOPs testing the NOP that are on invalid page so, standard NOP 32b nop so, all my 64b tests are still done in 32b process so that you can run them on normal OS then it detects via GS if 64b [mode] is available and in this case, you would get a different result so, if you run it on 64b, which I don't have here, you would get the actual tests on 64b and the results printed out. but still, it's not possible to debug that easily [wrong] but at least, there's no trick over there, so it's easy to bring back to a 64b process [to step over 64b code and return to the 32b process] PUSH/RET you print the output, and then... Olly nicely tells you that you will jump to 401008 but actually -- here the display is actually correct and the TLS already created a null page which prints 'FAIL' so, as expected, but there is no standard way to disassemble that correctly I can't execute the working 64k sections. and actually I'm executing all the code [the complete virtual space of all 64k sections] the sections are quite big and I'm modifying EAX so that all the 00 00 are executed and just to do a printf in the end. it actually takes a few seconds to execute on an i7 so it's actually quite funny to see... you launch it... even when the cache is loaded, and the OS is ready to be fast... you launch it... and printf comes a few seconds later virtual sections is the one that Hiew doesn't think it's a PE at all -- this is the latest Hiew well, it's been patched anyway well, I can't browse PE now that it doesn't think it's a PE file... but basically, it thinks that the OPTIONAL_HEADER points to the end of the file -- beyond the end of the file the folded header... a few error messages... because of the wrong data directories and the actual DD are at the start of... ...the section this would be the imports and the actual real DD and last, the one with the TLS AddressOfIndex that is pointing... ...inside the imports, at the AddressOfName so it will overwrite the loading [overwrite the pointer during loading] and when you just load it, it just says 'it's XP' because my imports were loaded this way, and not the other way. and if you run that file [under W7], it will give you another results and then, the exports... where some of the exports are actually very long you can see that actually, here I'm taking over the disassembly so I'm repeating the same fake opcodes and address so you fool the disassembler that way I think it's just a visual effect, they are no big problems but it's a known problem that was fixed recently in IDA that if you put an export in the middle of the instruction the fake export will actually take over the disassembly, and that would ruin the disassembly there's actually a PoC for that in Corkami, of course so, that's all for the demos so, I wanted to know more about x86 and PE which are far from perfectly documented and are still not perfectly documented, but at least, I've been covering some parts of it, there are still some gray areas, but at least, every day, I'm just learning a bit more, and publishing my results and sharing them openly, like WinDbg, if you follow only the official documentations, you will only get bad results, with malwares and packers out there, if you - yourself - are interested, or you develop a tool, an emulator, an engine, whatever... well you know you can just visit Corkami, read the pages, download the PoCs, which are [freely] available, and if you find any bugs - which might happen, then send me a postcard, or a red-cross T-shirt Thanks to Peter Ferrie, and all my reviewers, and people who contributed... do you have any questions ? did you ran them through AVs - antivirus scanners? you would find a sh*tload of 0days no, then, I wouldn't be good to actually turn them into exploits or anything, so... already breaking all the disassemblers and stuff was good enough for me I found a crash in Intel XED, which was good enough any other question? everybody survived the presentation? it's a great talk, man thank you! THANK YOU! [for watching]