Here are some side-by-side comparisons of disassembly and decompiler for ARM. Please maximize the window too see both columns simultaneously.
The following examples are displayed on this page:
Let's start with a very simple function. It accepts a pointer to a structure and zeroes out its first three fields. While the function logic is obvious by just looking at the decompiler output, the assembly listing has too much noise and requires studying it.
The decompiler saves your time and allows you to concentrate on more exciting aspects of reverse engineering.
Sorry for a long code snippet, ARM code tends to be longer compared to x86 code. This makes our comparison even more impressive: look at how concise is the decompiler output!
The ARM processor has conditional instructions that can shorten the code but require high attention from the reader. The case above is very simple, just note that there is a pair of instructions: MOVNE
and LDREQSH
. Only one of them will be executed at once. This is how simple if-then-else
looks in ARM.
The pseudocode shows it much better and does not require any explanations.
A quiz question: did you notice that MOVNE
loads zero to R0
? (because I didn't:)
Also note that in the disassembly listing we see var_8
but the location really used is var_A
, which corresponds to v4
.
Look, the decompiler output is longer! This is a rare case when the pseudocode is longer than the disassembly listing, but it is a for a good cause: to keep it readable. There are so many conditional instructions here, it is very easy to misunderstand the dependencies. For example, did you notice that the first MOVEQ
may use the condition codes set by CMP
? The subtle detail is that CMPNE may be skipped and the condition codes set by CMP
may reach MOVEQ
s.
The decompiler represented it perfectly well. I renamed some variables and set their types, but this was an easy task.
Conditional instructions are just part of the story. ARM is also famous for having a plethora of data movement instructions. They come with a set of possible suffixes that subtly change the meaning of the instruction. Take STMCSIA
, for example. It is a STM
instruction, but then you have to remember that CS
means "carry set" and IA
means "increment after".
In short, the disassembly listing is like Chinese. The pseudocode is longer but requires much less time to understand.
Sorry for another long code snippet. Just wanted to show you that the decompiler can handle compiler helper functions (like __divdi3
) and handles 64-bit arithmetic quite well.
Since ARM instructions cannot have big immediate constants, sometimes they are loaded with two instructions. There are many 0xFA
(250 decimal) constants in the disassembly listing, but all of them are shifted to the left by 2 before use. The decompiler saves you from these petty details.
Also a side: the decompiler can handle ARM mode as well as Thumb mode instructions. It just does not care about the instruction encoding because it is already handled by IDA.
In some case the disassembly listing can be misleading, especially with PIC (position independent code). While the address of a constant string is loaded into R12
, the code does not care about it. It is just how variable addresses are calculated in PIC-code (it is .got-someoffset). Such calculations are very frequent in shared objects and unfortunately IDA cannot handle all of them. But the decompiler did a great job of tracing R12
.