Comparisons of ARM disassembly and decompilation

Here are some side-by-side comparisons of disassembly and decompiler for ARM. Please maximize the window too see both columns simultaneously.

The following examples are displayed on this page:

Simple case

Let's start with a very simple function. It accepts a pointer to a structure and zeroes out its first three fields. While the function logic is obvious by just looking at the decompiler output, the assembly listing has too much noise and requires studying it.

The decompiler saves your time and allows you to concentrate on more exciting aspects of reverse engineering.

; struct_result *__fastcall sub_210DC(struct_result *result)                 
                                         
 var_10          = -0x10                                                      
 var_4           = -4                                                         
                                                                              
                 MOV     R12, SP                                              
                 STMFD   SP!, {R0}                                            
                 STMFD   SP!, {R12,LR}                                        
                 SUB     SP, SP, #4                                           
                 LDR     R2, [SP,#0x10+var_4]
                 MOV     R3, #0
                 STR     R3, [R2]
                 LDR     R3, [SP,#0x10+var_4]
                 ADD     R2, R3, #4
                 MOV     R3, #0
                 STR     R3, [R2]
                 LDR     R3, [SP,#0x10+var_4]
                 ADD     R2, R3, #8
                 MOV     R3, #0
                 STR     R3, [R2]
                 LDR     R3, [SP,#0x10+var_4]
                 STR     R3, [SP,#0x10+var_10]
                 LDR     R0, [SP,#0x10+var_10]
                 ADD     SP, SP, #4
                 LDMFD   SP, {SP,LR}
                 BX      LR
 ; End of function sub_210DC

64-bit arithmetics

Sorry for a long code snippet, ARM code tends to be longer compared to x86 code. This makes our comparison even more impressive: look at how concise is the decompiler output!

 ; bool __cdecl uh_gt_uc()                                                    
                 EXPORT _uh_gt_uc__YA_NXZ                                     
 _uh_gt_uc__YA_NXZ                       ; DATA XREF: .pdata:$T7452o          
                                                                              
 var_2C          = -0x2C                                                      
 var_28          = -0x28                                                      
 var_24          = -0x24                                                      
 var_20          = -0x20                                                      
 var_1C          = -0x1C                                                      
 var_18          = -0x18                                                      
 var_14          = -0x14                                                      
 var_10          = -0x10                                                      
 var_C           = -0xC                                                       
 var_8           = -8                                                         
 var_4           = -4                                                         
                                                                              
                 STR     LR, [SP,#var_4]! ; $M7441                            
                                         ; $LN8@uh_gt_uc                      
                 SUB     SP, SP, #0x28                                        

 $M7449
                 BL      uh
                 STR     R1, [SP,#0x2C+var_24]
                 STR     R0, [SP,#0x2C+var_28]
                 BL      uc
                 STRB    R0, [SP,#0x2C+var_20]
                 LDRB    R3, [SP,#0x2C+var_20]
                 STR     R3, [SP,#0x2C+var_1C]
                 LDR     R1, [SP,#0x2C+var_1C]
                 LDR     R3, [SP,#0x2C+var_1C]
                 MOV     R2, R3,ASR#31
                 LDR     R3, [SP,#0x2C+var_28]
                 STR     R3, [SP,#0x2C+var_18]
                 LDR     R3, [SP,#0x2C+var_24]
                 STR     R3, [SP,#0x2C+var_14]
                 LDR     R3, [SP,#0x2C+var_18]
                 STR     R3, [SP,#0x2C+var_10]
                 STR     R1, [SP,#0x2C+var_C]
                 LDR     R3, [SP,#0x2C+var_14]
                 CMP     R3, R2
                 BCC     $LN3_8

 loc_6AC
                 BHI     $LN5_0

 loc_6B0
                 LDR     R2, [SP,#0x2C+var_10]
                 LDR     R3, [SP,#0x2C+var_C]
                 CMP     R2, R3
                 BLS     $LN3_8

 $LN5_0
                 MOV     R3, #1
                 STR     R3, [SP,#0x2C+var_8]
                 B       $LN4_8
 ; ---------------------------------------------------------------------------

 $LN3_8
                                         ; uh_gt_uc(void)+68j
                 MOV     R3, #0
                 STR     R3, [SP,#0x2C+var_8]

 $LN4_8
                 LDR     R3, [SP,#0x2C+var_8]
                 AND     R3, R3, #0xFF
                 STRB    R3, [SP,#0x2C+var_2C]
                 LDRB    R0, [SP,#0x2C+var_2C]
                 ADD     SP, SP, #0x28
                 LDR     PC, [SP+4+var_4],#4
 ; End of function uh_gt_uc(void)

Conditional instructions

The ARM processor has conditional instructions that can shorten the code but require high attention from the reader. The case above is very simple, just note that there is a pair of instructions: MOVNE and LDREQSH. Only one of them will be executed at once. This is how simple if-then-else looks in ARM.

The pseudocode shows it much better and does not require any explanations.

A quiz question: did you notice that MOVNE loads zero to R0? (because I didn't:)

Also note that in the disassembly listing we see var_8 but the location really used is var_A, which corresponds to v4.

; int __cdecl ReadShort(void *, unsigned __int32 offset, int whence)         
 ReadShort                                                                    
                                                                              
 whence          = -0x18                                                      
 var_A           = -0xA                                                       
 var_8           = -8                                                         
                                                                              
                 STMFD   SP!, {R4,LR}                                         
                 SUB     SP, SP, #0x10   ; whence                             
                 MOV     R4, #0
                 ADD     R3, SP, #0x18+var_8
                 STRH    R4, [R3,#-2]!
                 STR     R2, [SP,#0x18+whence] ; whence
                 MOV     R2, R3          ; buffer
                 MOV     R3, #2          ; len
                 BL      ReadData
                 CMP     R0, R4
                 MOVNE   R0, R4
                 LDREQSH R0, [SP,#0x18+var_A]
                 ADD     SP, SP, #0x10
                 LDMFD   SP!, {R4,PC}
 ; End of function ReadShort

Conditional instructions - 2

Look, the decompiler output is longer! This is a rare case when the pseudocode is longer than the disassembly listing, but it is a for a good cause: to keep it readable. There are so many conditional instructions here, it is very easy to misunderstand the dependencies. For example, did you notice that the first MOVEQ may use the condition codes set by CMP? The subtle detail is that CMPNE may be skipped and the condition codes set by CMP may reach MOVEQs.

The decompiler represented it perfectly well. I renamed some variables and set their types, but this was an easy task.

; signed int __fastcall get_next_byte(entry_t *entry)
 get_next_byte                           ; DATA XREF: sub_3BC+30o
                                         ;
                 LDR     R2, [R0,#4]
                 CMP     R2, #0
                 LDRNE   R3, [R0]
                 LDRNEB  R1, [R3],#1
                 CMPNE   R1, #0
                 MOVEQ   R1, #1
                 STREQ   R1, [R0,#0xC]
                 MOVEQ   R0, 0xFFFFFFFF
                 MOVEQ   PC, LR
                 SUB     R2, R2, #1
                 STR     R2, [R0,#4]
                 STR     R3, [R0]
                 MOV     R0, R1
                 RET
 ; End of function get_next_byte

Complex instructions

Conditional instructions are just part of the story. ARM is also famous for having a plethora of data movement instructions. They come with a set of possible suffixes that subtly change the meaning of the instruction. Take STMCSIA, for example. It is a STM instruction, but then you have to remember that CS means "carry set" and IA means "increment after".

In short, the disassembly listing is like Chinese. The pseudocode is longer but requires much less time to understand.

; void __fastcall sub_2A38(list_t *ptr, unsigned int a2)
 sub_2A38                                ; CODE XREF: sub_5C8+48p
                                         ; sub_648+5Cp ...
                 MOV     R2, #0
                 STMFD   SP!, {LR}                                            
                 MOV     R3, R2
                 MOV     R12, R2
                 MOV     LR, R2
                 SUBS    R1, R1, #0x20

 loc_2A50                                ; CODE XREF: sub_2A38+24j
                 STMCSIA R0!, {R2,R3,R12,LR}
                 STMCSIA R0!, {R2,R3,R12,LR}
                 SUBCSS  R1, R1, #0x20
                 BCS     loc_2A50
                 MOVS    R1, R1,LSL#28
                 STMCSIA R0!, {R2,R3,R12,LR}
                 STMMIIA R0!, {R2,R3}
                 LDMFD   SP!, {LR}
                 MOVS    R1, R1,LSL#2
                 STRCS   R2, [R0],#4
                 MOVEQ   PC, LR
                 STRMIH  R2, [R0],#2
                 TST     R1, #0x40000000
                 STRNEB  R2, [R0],#1
                 RET
 ; End of function sub_2A38

Compiler helper functions

Sorry for another long code snippet. Just wanted to show you that the decompiler can handle compiler helper functions (like __divdi3) and handles 64-bit arithmetic quite well.

EXPORT op_two64                                              
     op_two64                                ; CODE XREF: refer_all+31Cp          
                                             ; main+78p                           
                                                                                  
     anonymous_1     = -0x28                                                      
     var_20          = -0x20                                                      
     anonymous_0     = -0x18                                                      
     var_10          = -0x10                                                      
     arg_0           =  4                                                         
                                                                                  
 000                 MOV     R12, SP                                              
 000                 STMFD   SP!, {R4,R11,R12,LR,PC}                              
 014                 SUB     R11, R12, #4                                         
 014                 SUB     SP, SP, #0x18                                        
 02C                 SUB     R4, R11, #-var_10
 02C                 STMDB   R4, {R0,R1}
 02C                 MOV     R1, 0xFFFFFFF0
 02C                 SUB     R12, R11, #-var_10
 02C                 ADD     R1, R12, R1
 02C                 STMIA   R1, {R2,R3}
 02C                 LDR     R3, [R11,#arg_0]
 02C                 CMP     R3, #1
 02C                 BNE     loc_9C44
 02C                 MOV     R3, 0xFFFFFFF0
 02C                 SUB     R0, R11, #-var_10
 02C                 ADD     R3, R0, R3
 02C                 SUB     R4, R11, #-var_10
 02C                 LDMDB   R4, {R1,R2}
 02C                 LDMIA   R3, {R3,R4}
 02C                 ADDS    R3, R3, R1
 02C                 ADC     R4, R4, R2
 02C                 SUB     R12, R11, #-var_20
 02C                 STMDB   R12, {R3,R4}
 02C                 B       loc_9D04
     ; ---------------------------------------------------------------------------

     loc_9C44                                ; CODE XREF: op_two64+30j
 02C                 LDR     R3, [R11,#arg_0]
 02C                 CMP     R3, #2
 02C                 BNE     loc_9C7C
 02C                 MOV     R3, 0xFFFFFFF0
 02C                 SUB     R0, R11, #-var_10
 02C                 ADD     R3, R0, R3
 02C                 SUB     R4, R11, #-var_10
 02C                 LDMDB   R4, {R1,R2}
 02C                 LDMIA   R3, {R3,R4}
 02C                 SUBS    R3, R1, R3
 02C                 SBC     R4, R2, R4
 02C                 SUB     R12, R11, #-var_20
 02C                 STMDB   R12, {R3,R4}
 02C                 B       loc_9D04
     ; ---------------------------------------------------------------------------

     loc_9C7C                                ; CODE XREF: op_two64+68j
 02C                 LDR     R3, [R11,#arg_0]
 02C                 CMP     R3, #3
 02C                 BNE     loc_9CB8
 02C                 MOV     R3, 0xFFFFFFF0
 02C                 SUB     R0, R11, #-var_10
 02C                 ADD     R3, R0, R3
 02C                 SUB     R2, R11, #-var_10
 02C                 LDMDB   R2, {R0,R1}
 02C                 LDMIA   R3, {R2,R3}
 02C                 BL      __muldi3
 02C                 MOV     R4, R1
 02C                 MOV     R3, R0
 02C                 SUB     R12, R11, #-var_20
 02C                 STMDB   R12, {R3,R4}
 02C                 B       loc_9D04
     ; ---------------------------------------------------------------------------

     loc_9CB8                                ; CODE XREF: op_two64+A0j
 02C                 LDR     R3, [R11,#arg_0]
 02C                 CMP     R3, #4
 02C                 BNE     loc_9CF4
 02C                 MOV     R3, 0xFFFFFFF0
 02C                 SUB     R0, R11, #-var_10
 02C                 ADD     R3, R0, R3
 02C                 SUB     R2, R11, #-var_10
 02C                 LDMDB   R2, {R0,R1}
 02C                 LDMIA   R3, {R2,R3}
 02C                 BL      __divdi3
 02C                 MOV     R4, R1
 02C                 MOV     R3, R0
 02C                 SUB     R12, R11, #-var_20
 02C                 STMDB   R12, {R3,R4}
 02C                 B       loc_9D04
     ; ---------------------------------------------------------------------------

     loc_9CF4                                ; CODE XREF: op_two64+DCj
 02C                 MOV     R3, 0xFFFFFFFF
 02C                 MOV     R2, 0xFFFFFFFF
 02C                 SUB     R4, R11, #-var_20
 02C                 STMDB   R4, {R2,R3}

     loc_9D04                                ; CODE XREF: op_two64+5Cj
                                             ; op_two64+94j ...
 02C                 SUB     R12, R11, #-var_20
 02C                 LDMDB   R12, {R0,R1}
 02C                 SUB     SP, R11, #0x10
 014                 LDMFD   SP, {R4,R11,SP,PC}
     ; End of function op_two64