Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The decompiler requires the latest version of IDA. While it may work with older versions (we try to ensure compatibility with a couple of previous versions), the best results are obtained with the latest version: first, IDA analyses files better; second, the decompiler can use additional available functionality.
The decompiler runs on MS Windows, Linux, and Mac OS X. It can decompile programs for other operating systems, provided they have been built using GCC/Clang/Visual Studio/Borland compilers.
32-bit decompilers require the 32-bit version of IDA to run.
64-bit decompilers require the 64-bit version of IDA to run.
IDA loads appropriate decompilers depending on the input file. If it cannot find any decompiler for the current input file, no decompilers will be loaded at all.
The GUI version of IDA is required for the interactive operation. For the text mode version, only the batch operation is supported.
Here are some side-by-side comparisons of disassembly and decompiler for ARM. Please maximize the window too see both columns simultaneously.
The following examples are displayed on this page:
Let's start with a very simple function. It accepts a pointer to a structure and zeroes out its first three fields. While the function logic is obvious by just looking at the decompiler output, the assembly listing has too much noise and requires studying it.
The decompiler saves your time and allows you to concentrate on more exciting aspects of reverse engineering.
Sorry for a long code snippet, ARM code tends to be longer compared to x86 code. This makes our comparison even more impressive: look at how concise is the decompiler output!
The ARM processor has conditional instructions that can shorten the code but require high attention from the reader. The case above is very simple, just note that there is a pair of instructions: MOVNE
and LDREQSH
. Only one of them will be executed at once. This is how simple if-then-else
looks in ARM.
The pseudocode shows it much better and does not require any explanations.
A quiz question: did you notice that MOVNE
loads zero to R0
? (because I didn't:)
Also note that in the disassembly listing we see var_8
but the location really used is var_A
, which corresponds to v4
.
Look, the decompiler output is longer! This is a rare case when the pseudocode is longer than the disassembly listing, but it is a for a good cause: to keep it readable. There are so many conditional instructions here, it is very easy to misunderstand the dependencies. For example, did you notice that the first MOVEQ
may use the condition codes set by CMP
? The subtle detail is that CMPNE may be skipped and the condition codes set by CMP
may reach MOVEQ
s.
The decompiler represented it perfectly well. I renamed some variables and set their types, but this was an easy task.
Conditional instructions are just part of the story. ARM is also famous for having a plethora of data movement instructions. They come with a set of possible suffixes that subtly change the meaning of the instruction. Take STMCSIA
, for example. It is a STM
instruction, but then you have to remember that CS
means "carry set" and IA
means "increment after".
In short, the disassembly listing is like Chinese. The pseudocode is longer but requires much less time to understand.
Sorry for another long code snippet. Just wanted to show you that the decompiler can handle compiler helper functions (like __divdi3
) and handles 64-bit arithmetic quite well.
Since ARM instructions cannot have big immediate constants, sometimes they are loaded with two instructions. There are many 0xFA
(250 decimal) constants in the disassembly listing, but all of them are shifted to the left by 2 before use. The decompiler saves you from these petty details.
Also a side: the decompiler can handle ARM mode as well as Thumb mode instructions. It just does not care about the instruction encoding because it is already handled by IDA.
In some case the disassembly listing can be misleading, especially with PIC (position independent code). While the address of a constant string is loaded into R12
, the code does not care about it. It is just how variable addresses are calculated in PIC-code (it is .got-someoffset). Such calculations are very frequent in shared objects and unfortunately IDA cannot handle all of them. But the decompiler did a great job of tracing R12
.
Here are some side-by-side comparisons of decompilations for v7.3 and v7.4. Please maximize the window too see both columns simultaneously.
The following examples are displayed on this page:
The text produced by v7.3 is not quite correct because the array at [ebp-128] was not recognized. Overall determining the array is a tough task but we can handle simple cases automatically now.
On the left there is a mysterious call to _extendsfdf2. In fact this is a compiler helper function that just converts a single precision floating point value into a double precision value. However, we do not want to see this call as is. It is much better to translate it into the code that looks more like C. Besides, there is a special treatment for printf-like functions.
In some cases we can easily prove that one variable can be mapped into another. The new version automatically creates a variable mapping in such cases. This makes the output shorter and easier to read. Needless to say that the user can revert the mapping if necessary.
The new version automatically applies symbolic constants when necessary. Less manual work.
This is not the longest C++ function name one may encounter but just compare the left and right sides. In fact the right side could even fit into one line easily, we just kept it multiline to be consistent. By the way, all names in IDA benefit from this simplification, not only the ones displayed by the decompiler. And it is configurable!
The battle is long but we do not give up. More 64-bit patterns are recognized now.
Yet another example of 64-bit arithmetics. The code on the left is correct but not useful at all. It can and should be converted into the simple equivalent text on the right.
Currently we support only GetProcAddress but we are sure that we will expand this feature in the future.\
Here are some side-by-side comparisons of disassembly and decompiler for PowerPC. Please maximize the window too see both columns simultaneously.
The following examples are displayed on this page:
This simple function calculates the sum of the squares of the first N natural numbers. While the function logic is obvious by just looking at the decompiler output, the assembly listing has too much noise and requires studying it. The decompiler saves your time and allows you to concentrate on more exciting aspects of reverse engineering.
The PowerPC processor has a number of instructions which can be used to avoid branches (for example cntlzw). The decompiler restores the conditional logic and makes code easier to understand.
64-bit comparison usually involves several compare and branch instructions which do not improve the code readability.
System call is always mysterious, but decompiler helps you with its name and arguments.
Compiler sometime uses helpers and decompiler knows the meaning of the many helpers and uses it to simplify code.
The PowerPC processor contains a number of complex floating point instructions which perform several operations at once. It is not easy to recover an expression from the assembler code but not for the decompiler.
Compilers can decompose a multiplication/division instruction into a sequence of cheaper instructions (additions, shifts, etc). This example demonstrates how the decompiler recognizes them and coagulates back to the original operation.
This example demonstrates that the decompiler can handle VLE code without problems.
The pseudocode is not something static because the decompiler is interactive the same way as IDA. You can change variable types and names, change function prototypes, add comments and more. The example above presents the result after these modifications.
Surely the result is not ideal, and there is a lot of room for improvement, but we hope that you got the idea.
And you can compare the result with the original: http://lxr.free-electrons.com/source/fs/fat/namei_msdos.c#L224
Hex-Rays' support for exceptions in Microsoft Visual C++/x64 incorporates the C++ exception metadata for functions into their decompilation, and presents the results to the user via built-in constructs in the decompilation (`try`, `catch`, `__wind`, `__unwind`). When the results cannot be presented entirely with these constructs, they will be presented via helper calls in the decompilation.
The documentation describes:
# TRY, CATCH, AND THROW
The C++ language provides the `try` scoped construct in which the developer expects that an exception might occur. `try` blocks must be followed by one or more scoped `catch` constructs for catching exceptions that may occur within. `catch` blocks may use `...` to catch any exception. Alternatively, `catch` blocks may name the type of an exception, such as `std::bad_alloc`. `catch` blocks with named types may or may not also catch the exception object itself. For example, `catch(std::bad_alloc *v10)` and `catch(std::bad_alloc *)` are both valid. The former can access the exception object through variable `v10`, whereas the latter cannot access the exception object.
C++ provides the `throw` keyword for throwing an exception, as in `std::bad_alloc ba; throw ba;`. This is represented in the output as (for example) `throw v10;`. C++ also allows code to rethrow the current exception via `throw;`. This is represented in the output as `throw;`.
# WIND AND UNWIND
Exception metadata in C++ binaries is split into two categories: `try` and `catch` blocks, as discussed above, and so-called `wind` and `unwind` blocks. C++ does not have `wind` and `unwind` keywords, but the compiler creates these blocks implicitly. In most binaries, they outnumber `try` and `catch` blocks by about 20 to 1.
Consider the following code, which may or may not throw an `int` as an exception at three places:
If an exception is thrown at point -1, the function exits early without executing any of its remaining code. As no objects have been created on the stack, nothing needs to be cleaned up before the function returns.
If an exception is thrown at point 0, the function exits early as before. However, since `string s0` has been created on the stack, it needs to be destroyed before exiting the function. Similarly, if an exception is thrown at point 1, both `string s1` and `string s0` must be destroyed.
These destructor calls would normally happen at the end of their enclosing scope, i.e. the bottom of the function, where the compiler inserts implicitly-generated destructor calls. However, since the function does not have any `try` blocks, none of the function's remaining code will execute after the exception is thrown. Therefore, the destructor calls at the bottom will not execute. If there were no other mechanism for destructing `s0` and/or `s1`, the result would be memory leaks or other state management issues involving those objects. Therefore, the C++ exception management runtime provides another mechanism to invoke their destructors: `wind` blocks and their corresponding `unwind` handlers.
`wind` blocks are effectively `try` blocks that are inserted invisibly by the compiler. They begin immediately after constructing some object, and end immediately before destructing that object. Their `unwind` blocks play the role of `catch` handlers, calling the destructor upon the object when exceptional control flow would otherwise cause the destructor call to be skipped.
Microsoft Visual C++ effectively transforms the previous example as follows:
`unwind` blocks always re-throw the current exception, unlike `catch` handlers, which may or may not re-throw it. Re-throwing the exception ensures that prior `wind` blocks will have a chance to execute. So, for example, if an exception is thrown at point 1, after the `unwind` handler destroys `string s1`, re-throwing the exception causes the unwind handler for point 0 to execute, thereby allowing it to destroy `string s0` before re-throwing the exception out of the function.
# STATE NUMBERS AND INSTRUCTION STATES
As we have discussed, the primary components of Microsoft Visual C++ x64 exception metadata are `try` blocks, `catch` handlers, `wind` blocks, and `unwind` handlers. Generally speaking, these elements can be nested within one another. For example, in C++ code, it is legal for one `try` block to contain another, and a `catch` handler may contain `try` blocks of its own. The same is true for `wind` and `unwind` constructs: `wind` blocks may contain other `wind` blocks (as in the previous example) or `try` blocks, and `try` and `catch` blocks may contain `wind` blocks.
Exceptions must be processed in a particular sequence: namely, the most nested handlers must be consulted first. For example, if a `try` block contains another `try` block, any exceptions occurring within the latter region must be processed by the innermost `catch` handlers first. Only if none of the inner `catch` handlers can handle the exception should the outer `try` block's catch handlers be consulted. Similarly, as in the previous example, `unwind` handlers must destruct their corresponding objects before passing control to any previous exception handlers (such as `string s1`'s `unwind` handler passing control to `string s0`'s `unwind` handler).
Microsoft's solution to ensure that exceptions are processed in the proper sequence is simple. It assigns a "state number" to each exception-handling construct. Each exception state has a "parent" state number whose handler will be consulted if the current state's handler is unable to handle the exception. In the previous example, what we called "point 0" is assigned the state number 0, while "point 1" is assigned the state number 1. State 1 has a parent of 0. (State 0's parent is a dummy value, -1, that signifies that it has no parent.) Since `unwind` handlers always re-throw exceptions, if state 1's `unwind` handler is ever invoked, the exception handling machinery will always invoke state 0's `unwind` handler afterwards. Because state 0 has no parent, the exception machinery will re-throw the exception out of the current function. This same machinery ensures that the catch handlers for inner `try` blocks are consulted before outer `try` blocks.
There is only one more piece to the puzzle: given that an exception could occur anywhere, how does the exception machinery know which exception handler should be consulted first? I.e., for every address within a function with C++ exception metadata, what is the current exception state? Microsoft C++/x64 binaries provide this information in the `IPtoStateMap` metadata tables, which is an array of address ranges and their corresponding state numbers.
# GUI OPERATION
This support is fully automated and requires no user interaction. However, the user can customize the display of C++ exception metadata elements for the global database, as well as for individual functions.
# GLOBAL SETTINGS
Under the `Edit->Other->C++ exception display settings` menu item, the user can edit the default settings to control which exception constructs are shown in the listing. These are saved persistently in the database (i.e., the user's choices are remembered after saving, closing, and re-opening), and can also be adjusted on a per-function basis (described later).
The settings on the dialog are as follows:
* Default output mode. When the plugin is able to represent C++ exception constructs via nice constructs like `try`, `catch`, `__wind`, and `__unwind` in the listings, these are called "structured" exception states. The plugin is not always able to represent exception metadata nicely, and may instead be forced to represent the metadata via helper calls in the listing (which are called "unstructured" states). As these can be messy and distracting, users may prefer not to see them by default. Alternatively, the user may prefer to see no exception metadata whatsoever, not even the structured ones. This setting allows the user to specify which types of metadata will be shown in the listing. * Show wind states. We discussed wind states and unwind handlers in the background material. Although these states can be very useful when reverse engineering C++ binaries (particularly when analyzing constructors), displaying them increases the amount of code in the listing, and sometimes the information they provide is more redundant than useful. Therefore, this checkbox allows the user to control whether they are shown by default. * Inform user of hidden states. The two settings just discussed can cause unstructured and/or wind states to be omitted from the default output. If this checkbox is enabled, then the plugin will inform the user of these omissions via messages at the top of the listing, such as this message indicating that one unstructured wind state was omitted: ``` // Hidden C++ exception states: #wind_helpers=1 ```
There are three more elements on the settings dialog; most users should never have to use them. However, for completeness, we will describe them now.
* Warning behavior. When internal warnings occur, they will either be printed to the output window at the bottom, or shown as a pop-up warning message box depending on this setting. * Reset per-function settings. The next section will discuss how the display settings described above can be customized on a per-function basis. This button allows the user to erase all such saved settings, such that all functions will use the global display settings the next time they are decompiled. * Rebuild C++ metadata caches. Before the plugin can show C++ exception metadata in the output, it must pre-process the metadata across the whole binary. Doing so crucially relies upon the ability to recognize the `__CxxFrameHandler3` and `__CxxFrameHandler4` unwind handler functions when they are referenced by the binary's unwind metadata. If the plugin fails to recognize one of these functions, then it will be unable to display C++ exception metadata for any function that uses the unrecognized unwind handler(s).
If the user suspects that a failure like this has taken place -- say, because they expect to see a `try`/`catch` in the output and it is missing, and they have confirmed that the output was not simply hidden due to the display settings above -- then this button may help them to diagnose and repair the issue. Pressing this button flushes the existing caches from the database and rebuilds them. It also prints output to tell the user which unwind handlers were recognized and which ones were not. The user can use these messages to confirm whether the function's corresponding unwind handler was unrecognized. If it was not, the user can rename the unwind handler function to something that contains one of the two aforementioned names, and then rebuild the caches again.
Note that users should generally not need to use this button, as the plugin tries several methods to recognize the unwind handlers (such as FLIRT signatures, recognizing import names, and looking at the destination of "thunk" functions with a single `jmp` to a destination function). If the user sees any C++ exception metadata in the output, this almost always means that the recognition worked correctly. This button should only be used by experienced users as a last resort. Users are advised to save their database before pressing this button, and only proceed with the changes if renaming unwind handlers and rebuilding the cache addresses missing metadata in the output.
# CONFIGURATION
The default options for the settings just described are controlled via the `%IDADIR%/cfg/eh34.cfg` configuration file. Editing this file will change the defaults for newly-created databases (but not affect existing databases).
# PER-FUNCTION SETTINGS
As just discussed, the user can control which C++ exception metadata is displayed in the output via the global menu item. Users can also customize these settings on a per-function basis (say, by enabling display of wind states for selected functions only), and they will be saved persistently in the database.
When a function has C++ exception metadata, one or more items will appear on Hex-Rays' right click menu. The most general one is "C++ exception settings...". Selecting this menu item will bring up a dialog that is similar to the global settings menu item with the following settings:
* Use global settings. If the user previously changed the settings for the function, but wishes that the function be shown via the global settings in the future, they can select this item and press "OK". This will delete the saved settings for the function, causing future decompilations to use the global settings. * This function's output mode. This functions identically to "Default output mode" from the global settings dialog, but only affects the current function. * Show wind states. Again, identical to the global settings dialog item.
There is a button at the bottom, "Edit global settings", which is simply a shortcut to the same global settings dialog from the `Edit->Other->C++ exception display settings` menu item.
The listing will automatically refresh if the user changes any settings.
Additionally, there are four other menu items that may or may not appear, depending upon the metadata present and whether the settings caused any metadata to be hidden. These menu items are shortcuts to editing the corresponding fields in the per-function settings dialog just discussed. They are:
* Show unstructured C++ states. If the global or per-function default output setting was set to "Structured only", and the function had unstructured states, this menu item will appear. Clicking it will enable display of unstructured states for the function and refresh the decompilation. * Hide unstructured C++ states. Similar to the above. * Show wind states. If the global or per-function "Show wind states" setting was disabled, and the function had wind states, this menu item will appear. Clicking it will enable display of wind states for the function and refresh the decompilation. * Hide wind states. Similar to the above.
# KEYBOARD SHORTCUTS
The user can change (add, remove, or edit) the keyboard shortcuts for the per-function settings right-click menu items from the `Edit->Options->Shortcuts` dialog. The names of the corresponding actions are:
* "C++ exception settings": `eh34:func_settings` * "Show unstructured C++ states": `eh34:enable_unstructured` * "Hide unstructured C++ states": `eh34:disable_unstructured` * "Show wind states": `eh34:enable_wind` * "Hide wind states": `eh34:disable_wind` * The global settings dialog: `eh34:config_menu`
# HELPER CALLS
Hex-Rays' Microsoft C++ x64 exception support tries to details about exception state numbers as much as possible. However, compiler optimizations can cause binaries to diverge from the original source code. For example, inlined functions can produce `goto` statements in the decompilation despite there being none in the source. Optimizations can also cause C++ exception metadata to differ from the original code. As a result, it is not always possible to represent `try`, `catch`, `wind`, and `unwind` constructs as scoped regions that hide the low-level details.
In these cases, Hex-Rays' Microsoft C++ x64 exception support will produce helper calls with informative names to indicate when exception states are entered and exited, and to ensure that the user can see the bodies of `catch` and `unwind` handlers in the output. The user can hover their mouse over those calls to see their descriptions. They are also catalogued below.
The following helper calls are used when exception states have multiple entrypoints, or multiple exits:
The following helper calls are used when exception states had single entry and exit points, but could not be represented via `try` or `__wind` keywords:
The following helper calls are used to display `catch` handlers for exception states that could not be represented via the `catch` keyword:
The following helper calls should be removed, but if you see them, they signify the boundary of a `catch` handler:
The following helper calls are used to display `unwind` handlers for exception states that could not be represented via the `__unwind` keyword:
The following helper calls are used to signify that an `unwind` handler has finished executing, and will transfer control to a parent exception state (or outside of the function):
The following helper call is used when the exception metadata did not specify a function pointer for an `unwind` handler, which causes program termination:
The following helper calls are used to signify that Hex-Rays was unable to display an exception handler in the decompilation:
Starting from MSVC 2017 Service Pack 3 (version 14.13), the compiler began applying optimizations to reduce the size of the C++ exception metadata. An official Microsoft blog entry entitled ["Making C++ Exception Handling Smaller on x64"](https://devblogs.microsoft.com/cppblog/making-cpp-exception-handling-smaller-x64/)
As a result of these changes, the C++ exception metadata in MSVC 14.13+ binaries is no longer fully precise. Exception states are frequently reported as beginning physically after where the source code would indicate. In order to produce usable output, Hex-Rays employs mathematical optimization algorithms to reconstruct more detailed C++ exception metadata configurations that can be displayed in a nicer format in the decompilation. These algorithms improve the listings by producing more structured regions and fewer helper calls in the output, but they introduce further imprecision as to the true starting and ending locations of exception regions when compared to the source code. They are an integral part of Hex-Rays C++/x64 Windows exception metadata support and cannot be disabled.
The takeaway is that, when processing MSVC 14.13+ binaries, Hex-Rays C++/x64 Windows exception support frequently produces `try` and `__unwind` blocks that begin and/or end earlier and/or later than what the source code would indicate, were it available. This has important consequences for vulnerability analysis.
For example, given accurate exception boundary information, the destructor for a local object would ordinarily be situated after the end of that object's `__wind` and `__unwind` blocks, as in:
Yet, due to the imprecise boundary information, Hex-Rays might display the destructor as being inside of the `__wind` block:
The latter output might indicate that `v14`'s destructor would be called twice if its destructor were to throw an exception. However, this indication is simply the result of imprecise exception region boundary information. In short, users should be wary of diagnosing software bugs or security issues based upon the positioning of statements nearby the boundaries of `try` and `__wind` blocks. The example above indicates something that might appear to be a bug in the code -- a destructor being called twice -- but is in fact not one.
These considerations primarily apply when analyzing C++ binaries compiled with MSVC 14.13 or greater. They do not apply as much to binaries produced by MSVC 14.12 or earlier, when the compiler emitted fully precise information about exception regions.
Although Hex-Rays may improve its detection of exception region boundaries in the future, because modern binaries lack the ground truth of older binaries, the results will never be fully accurate. If the imprecision is unacceptable to you, we recommend permanently disabling C++ metadata display via the `eh34.cfg` file discussed previously.
# MISCELLANEOUS
Hex-Rays' support for exceptions in Microsoft Visual C++/x64 only works after auto-analysis has been completed. Users can explore the database and decompile functions as usual, but no C++ exception metadata will be shown. Users are advised to refresh any decompilation windows after auto-analysis has completed.
If users have enabled display of wind states, they may see empty `__wind` or `__unwind` constructs in the output. Usually, this does not indicate an error occurred; this usually means that the region of the code corresponding the `wind` state was very small or contained dead code, and Hex-Rays normal analysis and transformation made it empty.
Starting in IDA 9.0, IDA's auto-analysis preprocesses C++ exception metadata differently than in previous versions. In particular, on MSVC/x64 binaries, `__unwind` and `catch` handlers are created as standalone functions, not as chunks of their parent function as in earlier versions. This is required to display the exception metadata correctly in the decompilation. For databases created with older versions, the plugin will still show the outline of the exception metadata, but the bodies of the `__unwind` and `catch` handlers will be displayed via the helper calls `__eh34_unwind_handler_absent` and `__eh34_catch_handler_absent`, respectively. The plugin will also print a warning at the top of the decompilation such as `Absent C++ exception handlers: #catch=1 (pre-9.0 IDB)` in these situations. Re-creating the IDB with a newer version will solve those issues, although users might still encounter absent handlers in new databases (rarely, and under different circumstances).
Below you will find side-by-side comparisons of v7.2 and v7.3 decompilations. Please maximize the window too see both columns simultaneously.
The following examples are displayed on this page:
NOTE: these are just some selected examples that can be illustrated as side-by-side differences. There are many other improvements and new features that are not mentioned on this page. We just got tired selecting them. Some of the improvements that did not do to this page:
objc-related improvements
value range analysis can eliminate more useless code
better resolving of got-relative memory references
too big shift amounts are converted to lower values (e.g. 33->1)
more for-loops
better handling of fragemented variables
many other things...
When a constant looks nicer as a hexadecimal number, we print it as a hexadecimal number by default. Naturally, beauty is in the eye of the beholder, but the new beahavior will produce more readable code, and less frequently you will fell compelled to change the number representation. By the way, this tiny change is just one of numerious improvements that we keep adding in each release. Most of them go literally unnoticed. It is just this time we decided to talk about them
EfiBootRecord points to a structure that has RecordExtents[0] as the last member. Such structures are considered as variable size structures in C/C++. Now we handle them nicely.
We were printing UTF-8 and other string types, UTF-32 was not supported yet. Now we print it with the 'U' prefix.
The difference between these outputs is subtle but pleasant. The new version managed to determine the variable types based on the printf format string. While the old version ended up with int a2, int a3
, the new version correctly determined them as one __int64 a2
.
A similar logic works for scanf-like functions. Please note that the old version was misdetecting the number of arguments. It was possible to correct the misdetected arguments using the Numpad-Minus hotkey but it is always better when there is less routine work on your shoulders, right?
While seasoned reversers know what is located at fs:0
, it is still better to have it spelled out. Besides, the type of v15
is automatically detected as struct _EXCEPTION_REGISTRATION_RECORD *
.
Again, the user can specify the union field that should be used in the output (the hotkey is Alt-Y
) but there are situations when it can be automatically determined based on the access type and size. The above example illustrates this point. JFYI, the type of entry
is:
While we can not handle bitfields yet, their presence does not prevent using other, regular fields, of the structure.
I could not resist the temptation to include one more example of automatic union selection. How beautiful the code on the right is!
No comments needed, we hope. The new decompiler managed to fold constant expressions after replacing EABI helpers with corresponding operators.
Now it works better especially in complex cases.
In this case too, the user could set the prototype of sub_1135FC
as accepting a char *
and this would be enough to reveal string references in the output, but the new decompiler can do it automatically.
The code on the left had a very awkward sequence to copy a structure. The code on the right eliminates it as unnecessary and useless.
Do you care about this improvement? Probably you do not care because the difference is tiny. However, in additon to be simpler, the code on the right eliminated a temporary variable, v5
. A tiny improvement, but an improvement it is.
Another tiny improvement made the output considerably shorter. We like it!
This is a very special case: a division that uses the rcr
instruction. Our microcode does not have the opcode for it but we implemented the logic to handle some special cases, just so you do not waste your time trying to decipher the meaning of convoluted code (yes, rcr
means code that is difficult to understand).
Well, we can not say that we produce less gotos in all cases, but there is some improvement for sure. Second, note that the return type got improved too: now it is immediately visible that the function returns a boolean (0/1) value.
What a surprise, the code on the right is longer and more complex! Indeed, it is so, and it is because now the decompiler is more careful with the division instructions. They potentially may generate the zero division exception and completely hiding them from the output may be misleading. If you prefer the old behaviour, turn off the division preserving in the configuration file.
Do you notice the difference? If not, here is a hint: the order of arguments of sub_88
is different. The code on the right is more correct because the the format specifiers match the variable types. For example, %f
matches float a
. At the first sight the code on the left looks completely wrong but (surprise!) it works correctly on x64 machines. It is so because floating point and integer arguments are passed at different locations, so the relative order of floating/integer arguments in the call does not matter much. Nevertheless, the code on the right causes less confusion.
This is a never ending battle, but we advance!
Below you will find side-by-side comparisons of v7.1 and v7.2 decompilations. Please maximize the window too see both columns simultaneously.
The following examples are displayed on this page:
NOTE: these are just some selected examples that can be illustrated as side-by-side differences. There are many other improvements and new features that are not mentioned on this page.
In the past the Decompiler was able to recognize magic divisions in 32-bit code. We now support magic divisions in 64-bit code too.
More aggressive folding of if_one_else_zero constructs; the output is much shorter and easier to grasp.
The decompiler tries to guess the type of the first argument of a constructor. This leads to improved listing.
The decompiler has a better algorithm to find the correct union field. This reduces the number of casts in the output.
We improved recognition of 'for' loops, they are shorter and much easier to understand.
Please note that the code on the left is completely illegible; the assembler code is probably easier to work with in this case. However, the code on the right is very neat. JFYI, below is the class hierarchy for this example:
Also please note that the source code had
but at the assembler level we have
Visual Studio plays such tricks.
Yes, the code on the left and on the right do the same. We prefer the right side, very much.
Minor stuff, one would say, and we'd completely agree. However, these minor details make reading the output a pleasure.
This is a rare addressing mode that is nevertheless used by compilers. Now we support it nicely.
The new decompiler managed to disentangle the obfuscation code and convert it into a nice strcpy()
The new version knows about ObjC blocks and can represent them correctly in the output. See Edit, Other, Objective-C
submenu in IDA, it contains the necessary actions to analyze the blocks.
We continue to improve recognition of 64-bit arithmetics. While it is impossible to handle all cases, we do not give up.
Yet another optimization rule that lifts common code from 'if' branches. We made it even more aggressive.
Sometimes compilers reuse the same stack slot for different purposes. Many our users asked us to add a feature to handle this situation. The new decompiler addresses this issue by adding a command to force creation of a new variable at the specified point. Currently we support only aliasable stack variables because this is the most common case.
In the sample above the slot of the p_data_format
variable is reused. Initially it holds a pointer to an integer (data_format) and then it holds a simple integer (errcode). Previous versions of the decompiler could not handle this situation nicely and the output would necessarily have casts and quite difficult to read. The two different uses of the slot would be represented just by one variable. You can see it in the left listing.
The new version produces clean code and displays two variables. Naturally it happens after applying the force new variable
command.
Well, these listings require no comments, the new version apparently wins!
Hotkey: Y
The SetType command sets the type of the current item. It can be applied to the following things:
Function
Local variable
Global item (function or data)
If the command is applied to the very first line of the output text, the decompiler will try to detect the current function argument. If the cursor is on an argument declaration, then the argument type will be modified. Otherwise, the current function type will be modified.
In all other cases the item under the cursor will be modified.
When modifying the prototype of the current function you may add or remove function arguments, change the return type, and change the calling convention. If you see that the decompiler wrongly created too many function arguments, you can remove them.
The item type must be specified as a C type declaration. All types defined in the loaded type libraries, all structures in the structure window, all enum definitions in the enum window can be used.
This is a very powerful command. It can change the output dramatically. Use it to remove cast operations from the output and to make it more readable. In some cases, you will need to define structure types in the structure window and only after that use them in the pseudocode window.
NOTE: since the arguments of indirect calls are collected before defining variables, specifying the type of the function pointer may not be enough. Please read this for more info.
Since variables and function types are essential, the decompiler uses colors to display them. By default, definite types (set by the user, for example) are displayed in blue while guessed types are displayed in gray. Please note that the guessed types may change if the circumstances change. For example, if the prototype of a called function is changed, the variable that holds its return value may change automatically, unless its type was set by the user.
This command does not rename the operated item, even if you specify the name in the declaration. Please use the rename command for that.
See also: interactive operation
Hotkey: /
This command edits the indented comment for the current line or the current variable. It can be applied to the local variable definition area (at the top of the output) and to the function statement area (at the bottom of the output).
If applied to the local variable definition area, this command edits the comment for the current local variable. Otherwise the comment for the current line will be edited.
Please note that due to the highly dynamic nature of the output, the decompiler uses a rather complex coordinate system to attach comments. Some output lines will not have a coordinate in this system. You cannot edit comments for these lines. We will try to overcome this limitation in the future but it might take some time and currently we do not have a clear idea how to improve the existing coordinate system.
Each time the output text changes the decompiler will rearrange the entered comments so they are displayed close to their original locations. However, if the output changes too much, the decompiler could fail to display some comments. Such comments are called "orphan comments". All orphan comments are printed at the very end of the output text.
You can cut and paste them to the correct locations or you can delete them with the "Delete orphan comments" command using the right-click menu.
The starting line position for indented comments can be configured by the user. Please check the COMMENT_INDENT parameter in the configuration file.
See also: Edit block comment | Interactive operation
A decompiler represents executable binary files in a readable form. More precisely, it transforms binary code into text that software developers can read and modify. The software security industry relies on this transformation to analyze and validate programs. The analysis is performed on the binary code because the source code (the text form of the software) traditionally is not available, because it is considered a commercial secret.
Programs to transform binary code into text form have always existed. Simple one-to-one mapping of processor instruction codes into instruction mnemonics is performed by disassemblers. Many disassemblers are available on the market, both free and commercial. The most powerful disassembler is our own IDA Pro. It can handle binary code for a huge number of processors and has open architecture that allows developers to write add-on analytic modules.
Decompilers are different from disassemblers in one very important aspect. While both generate human readable text, decompilers generate much higher level text which is more concise and much easier to read.
Compared to low level assembly language, high level language representation has several advantages:
It is consise.
It is structured.
It doesn't require developers to know the assembly language.
It recognizes and converts low level idioms into high level notions.
It is less confusing and therefore easier to understand.
It is less repetitive and less distracting.
It uses data flow analysis.
Let's consider these points in detail.
Usually the decompiler's output is five to ten times shorter than the disassembler's output. For example, a typical modern program contains from 400KB to 5MB of binary code. The disassembler's output for such a program will include around 5-100MB of text, which can take anything from several weeks to several months to analyze completely. Analysts cannot spend this much time on a single program for economic reasons.
The decompiler's output for a typical program will be from 400KB to 10MB. Although this is still a big volume to read and understand (about the size of a thick book), the time needed for analysis time is divided by 10 or more.
The second big difference is that the decompiler output is structured. Instead of a linear flow of instructions where each line is similar to all the others, the text is indented to make the program logic explicit. Control flow constructs such as conditional statements, loops, and switches are marked with the appropriate keywords.
The decompiler's output is easier to understand than the disassembler's output because it is high level. To be able to use a disassembler, an analyst must know the target processor's assembly language. Mainstream programmers do not use assembly languages for everyday tasks, but virtually everyone uses high level languages today. Decompilers remove the gap between the typical programming languages and the output language. More analysts can use a decompiler than a disassembler.
Decompilers convert assembly level idioms into high-level abstractions. Some idioms can be quite long and time consuming to analyze. The following one line code
x = y / 2;
can be transformed by the compiler into a series of 20-30 processor instructions. It takes at least 15- 30 seconds for an experienced analyst to recognize the pattern and mentally replace it with the original line. If the code includes many such idioms, an analyst is forced to take notes and mark each pattern with its short representation. All this slows down the analysis tremendously. Decompilers remove this burden from the analysts.
The amount of assembler instructions to analyze is huge. They look very similar to each other and their patterns are very repetitive. Reading disassembler output is nothing like reading a captivating story. In a compiler generated program 95% of the code will be really boring to read and analyze. It is extremely easy for an analyst to confuse two similar looking snippets of code, and simply lose his way in the output. These two factors (the size and the boring nature of the text) lead to the following phenomenon: binary programs are never fully analyzed. Analysts try to locate suspicious parts by using some heuristics and some automation tools. Exceptions happen when the program is extremely small or an analyst devotes a disproportionally huge amount of time to the analysis. Decompilers alleviate both problems: their output is shorter and less repetitive. The output still contains some repetition, but it is manageable by a human being. Besides, this repetition can be addressed by automating the analysis.
Repetitive patterns in the binary code call for a solution. One obvious solution is to employ the computer to find patterns and somehow reduce them into something shorter and easier for human analysts to grasp. Some disassemblers (including IDA Pro) provide a means to automate analysis. However, the number of available analytical modules stays low, so repetitive code continues to be a problem. The main reason is that recognizing binary patterns is a surprisingly difficult task. Any "simple" action, including basic arithmetic operations such as addition and subtraction, can be represented in an endless number of ways in binary form. The compiler might use the addition operator for subtraction and vice versa. It can store constant numbers somewhere in its memory and load them when needed. It can use the fact that, after some operations, the register value can be proven to be a known constant, and just use the register without reinitializing it. The diversity of methods used explains the small number of available analytical modules.
The situation is different with a decompiler. Automation becomes much easier because the decompiler provides the analyst with high level notions. Many patterns are automatically recognized and replaced with abstract notions. The remaining patterns can be detected easily because of the formalisms the decompiler introduces. For example, the notions of function parameters and calling conventions are strictly formalized. Decompilers make it extremely easy to find the parameters of any function call, even if those parameters are initialized far away from the call instruction. With a disassembler, this is a daunting task, which requires handling each case individually.
Decompilers, in contrast with disassemblers, perform extensive data flow analysis on the input. This means that questions such as, "Where is the variable initialized?"" and, "Is this variable used?" can be answered immediately, without doing any extensive search over the function. Analysts routinely pose and answer these questions, and having the answers immediately increases their productivity.
Below you will find side-by-side comparisons of disassembly and decompilation outputs. The following examples are available:
The following examples are displayed on this page:
Just note the difference in size! While the disassemble output requires you not only to know that the compilers generate such convoluted code for signed divisions and modulo operations, but you will also have to spend your time recognizing the patterns. Needless to say, the decompiler makes things really simple.
Questions like
What are the possible return values of the function?
Does the function use any strings?
What does the function do?
can be answered almost instantaneously looking at the decompiler output. Needless to say that it looks better because I renamed the local variables. In the disassembler, registers are renamed very rarely because it hides the register use and can lead to confusion.
IDA highlights the current identifier. This feature turns out to be much more useful with high level output. In this sample, I tried to trace how the retrieved function pointer is used by the function. In the disassembly output, many wrong eax occurrences are highlighted while the decompiler did exactly what I wanted.
Arithmetics is not a rocket science but it is always better if someone handles it for you. You have more important things to focus on.
The decompiler recognized a switch statement and nicely represented the window procedure. Without this little help the user would have to calculate the message numbers herself. Nothing particularly difficult, just time consuming and boring. What if she makes a mistake?...
This is an excerpt from a big function to illustrate short-circuit evaluation. Complex things happen in long functions and it is very handy to have the decompiler to represent things in a human way. Please note how the code that was scattered over the address space is concisely displayed in two if
statements.
The decompiler tries to recognize frequently inlined string functions such as strcmp, strchr, strlen, etc. In this code snippet, calls to the strlen
function has been recognized.
Let's start with a very short and simple function:
We decompile it with View, Open subviews, Pseudocode (hotkey F5):
While the generated C code makes sense, it is not pretty. There are many cast operations cluttering the text. The reason is that the decompiler does not perform the type recovery yet. Apparently, the a1 argument points to a structure but the decompiler missed it. Let us add some type information to the database and see what happens. For that we will open the Structure window (Shift-F9) and add a new structure type:
After that, we switch back to the pseudocode window and specify the type of a1. We can do it by positioning the cursor on any occurrence of a1 and pressing Y:
When we press Enter, the decompilation output becomes much better:
But there is some room for improvement. We could rename the structure fields and specify their types. For example, field_6B1 seems to be used as a counter and field_6B5 is obviously a function pointer. We can do all this without switching windows now. . Here is how we specify the type of the function pointer field:
The final result looks like this:
Please note that there are no cast operations in the text and overall it looks much better than the initial version.
Hotkeys:
H - toggle between hexadecimal and decimal representations
R - switch to character constant representation
M - switch to enumeration (symbolic constant) representation
_ - invert sign
T - apply struct offset
This command allows the user to specify the desired form of a numeric constant. Please note that some constants have a fixed form and cannot be modified. This mainly includes constants generated by the decompiler on the fly.
The decompiler ties the number format information to the instruction that generated the constant. The instruction address and the operand number are used for that. If a constant, which was generated by a single instruction, is used in many different locations in the pseudocode, all these locations will be modified at once.
Using the 'invert sign' negates the constant and resets the enum/char flag if it was set.
When this command is applied the first time to a negative constant, the output will seemingly stay the same. However, the list of symbolic constants available to the M hotkey changes. For example, if the constant is '-2', then before inverting the sign the symbolic constants corresponding to '-2' are available. After inverting the sign the symbolic constants corresponding to '2' are available.
The T hotkey applies the structure offset to the number. For positive numbers, it usually converts the number into offsetof() macro. For negative numbers, it usually converts the whole (var-num) expression into the macro. By the way, the decompiler tries to use other hints to detect this macro. It checks if the number corresponds to a structure offset in the disassembly listing. For example, an expression like
can be converted into
where structype * is the type of v1 and offsetof(structype, fieldname) == num. Please note that v2 must be declared as a pointer to the corresponding structure field, otherwise the conversion may fail.
See also:
Hotkey: N
The rename command renames the current item. It can be applied to the following things:
Function
Local variable
Global item (function or data)
Structure field
Statement label
Normally the item under the cursor will be renamed. If the command is applied to the very first line of the output text and the decompiler cannot determine the item under the cursor, the current function will be renamed.
See also:
Hotkeys
None
Split current expression
None
Unsplit current expression
This command splits the current expression into multiple expressions. It is available only for int16, int32, or int64 assignments or expressions which were combiled by the decompiler (e.g. 64bit comparison on 32bit platform). Splitting an assignment breaks it into two assignments: one for the low part and one for the high part. Other expressions can be splitted into more than two expressions.
This command is useful if the decompiler erroneously combines multiple unrelated expressions into one. In some cases the types of the new variables should be explicitly specified to get a nice listing. For example:
can be split into two assignments:
by right clicking on the 64-bit assignment operation (the '=' sign) and selecting the 'Split' command.
The split expression can be unsplit using the unsplit command. Unsplitting removes all effects of the previous Split commands.
See also:
Hotkeys
Keypad -
Hide current statement
Keypad +
Unhide current statement
This command collapses the current statement into one line. It can be applied to multiline statements (if, while, for, do, switch, blocks).
The hidden item can be uncollapsed using the unhide command.
See also:
In some cases, especially for indirect calls, the decompiler cannot correctly detect call arguments. The 'Set call type' command sets the type of the function call at the current item without changing the prototype of the called function itself. So there is a difference between 'Set call type' and commands. Let us assume that there is a call
and that the decompiler erroneously detected one argument whereas four arguments actually are present. If the user sets the new call type as
then the call will be transformed into
and the type of off_5C6E4 will remain unchanged. Note that in this case the user can revert the call to the previous state using the command.
The command will have a different effect:
It sets the new type for off_5C6E4 that will cause changes to all places where off_5C6E4 is called, including the current call.
This command also can be used to specify the __noreturn attribute of a call.
NOTE: Behind the scenes the 'Set call type' command, like , copies the entered type to the operand of the call instruction. Actually it is a shortcut to Edit, Operand type, Set operand type in the disassembly view while staying on the call instruction.
See also:
The decompiler adds the following commands to the menus:
This command decompiles the current function. If the decompilation is successful, it opens a new window titled "Pseudocode" and places the generated C text in this window.
The following commands can be used in the pseudocode window:
If the current item is a local variable, additional items may appear in the context menu:
If the current item is a union field, an additional item may appear in the context menu:
If the current item is a parenthesis, bracket, or a curly brace, the following hotkey is available:
The user can also select text and copy it to the clipboard with the Ctrl-C combination.
If the current item is C statement keyword, an additional item may appear in the context menu:
The user can also select text and copy it to the clipboard with the Ctrl-C combination.
Pressing Enter on a function name will decompile it. Pressing Esc will return to the previously decompiled function. If there is no previously decompiled function, the pseudocode window will be closed.
Ctrl-Enter or Ctrl-double click on a function name will open a new pseudocode window for it.
Pressing F5 while staying in a pseudocode window will refresh its contents. Please note that the decompiler never refreshes pseudocode by itself because it can take really long.
The user can use the mouse right click or keyboard hotkeys to access the commands. Please check the command descriptions for the details.
This command toggles between the disassembly view and pseudocode view. If there is no pseudocode window, a new window will be created.
Pressing Tab while staying in the pseudocode window will switch to the disassembly window. The Tab key can be used to toggle pseudocode and disassembly views.
This command decompiles the selected functions or the whole application. It will ask for the name of the output .c file.
If there is a selected area in the disassembly view, only the selected functions will be decompiled. Otherwise, the whole application will be decompiled.
When the whole application is decompiled, the following rules apply:
the order of decompilation is determined by the decompiler. It will start with the leaf functions and will proceed in the postnumbering order in the call graph. This order makes sure that when we decompile a function, we will have all information about the called functions. Obviously, for recursive functions some information will be still missing.
the library (light blue) functions will not be decompiled. By the way, this is a handy feature to exclude functions from the output.
A decompilation failure will not stop the analysis but the internal errors will. The decompiler generates #error directives for failed functions.
This command decompiles the current function and copies the pseudocode to the disassembly listing in the form of anterior comments. If the current function already has a pseudocode window, its contents are used instead of decompiling the function anew.
This command deletes all anterior comments created by the previous command. Its name is a slight misnomer because it does not verify the comment origin. In fact, all anterior comments within the current function are deleted.
This command marks/unmarks instructions to be skipped by the decompiler. It is useful if some prolog/epilog instructions were missed by IDA. If such instructions were not detected and marked, the decompilation may fail (most frequently the call analysis will fail).
The decompiler skips the prolog, epilog, and switch instructions. It relies on IDA to mark these instructions. Sometimes IDA fails to mark them, and this command can be used to correct the situation.
If the command is applied to marked instructions, it will unmark them.
By default, the skipped instructions are not visualized. To make them visible, edit the IDA.CFG file and uncomment the following lines:
This command deletes decompiler information.
It can delete information about global objects (functions, static data, structure/enum types) and/or information local to the current function.
Use this command if you inadvertently made some change that made decompilation impossible.
This commands configures a function call the current instruction should be replaced by in the pseudocode output.
Special names can be used to access operands of the current instructions: __OP1, __OP2, ... for first, second, etc. operands. Each function argument having a name like that will be replaced in the call by the value of the corresponding operand of the instruction. Also if the function name has this format, a call to the location pointed by the corresponding operand will be generated. Other arguments and the return value will be placed into locations derived from the function prototype according to the current compiler, calling convention, argument and return types. You can use IDA-specific __usercall calling convention to specify arbitrary locations independently of platform and argument/return types (read IDA help pages about the user defined calling conventions for more info).
Examples
We could ask to replace the following instruction:
by specifying the following prototype:
which would lead to the following decompiler output:
where v1 is mapped to ax.
The following prototype:
applied to the second instruction in the following piece of code:
will generate the following pseudocode:
where v1, v2, v3 are mapped to R0, R1, R2 respectively
This command packs and sends the current database to our server. The user can specify his/her email and add notes about the error. This is the preferred way of filing bugreports because it is virtually impossible to do anything without a database. The database will also contain the internal state of the decompiler, which is necessary to reproduce the bug.
The database is sent in the compressed form to save the bandwidth. An encrypted connection (SSL) is used for the transfer.
This command deletes all code and data from the current idb extract the current function. It can be used to reduce the database size before sending a bug report. Please note that deleting information from the database may make the bug irreproducible, so please verify it after applying this command.
Hotkey: Ins
This command edits the block comment for the current line. The entered comment will be displayed before the current line.
Please note that due to the highly dynamic nature of the output, the decompiler uses a rather complex to attach comments. Some output lines will not have a coordinate in this system. You cannot edit comments for these lines. Also, some lines have the same coordinate. In this case, the comment will be attached to the first line with the internal coordinate. We will try to overcome this limitation in the future but it might take some time and currently we do not have a clear idea how to improve the existing coordinate system.
Each time the output text changes the decompiler will rearrange the entered comments so they are displayed close to their original locations. However, if the output changes too much, the decompiler could fail to display some comments. Such comments are called "orphan comments". All orphan comments are printed at the very end of the output text.
If applied to the function declaration line, this command edits the function comment. This comment is shared with IDA: it is the same as the function comment in IDA.
You can cut and paste them to the correct locations or you can delete them with the "Delete orphan comments" command using the right-click menu.
See also: |
Here are some side-by-side comparisons of disassembly and decompiler for MIPS. Please maximize the window too see both columns simultaneously.
The following examples are displayed on this page:
This is a very simple code to decompile and the output is perfect. The only minor obstacle are references through the global offset table but both IDA and the Decompiler handle them well. Please note the difference in the number of lines to read on the left and on the right.
Sorry for another long assembler listing. It shows that for MIPS, as for other platforms, the decompiler can recognize 64-bit operations and collapse them into very readable constructs.
We recognize magic divisions for MIPS the same way as for other processors. Note that this listing has a non-trivial delay slot.
The previous example was a piece of cake. This one shows a tougher nut to crack: there is a jump to a delay slot. A decent decompiler must handle these cases too and produce a correct output without misleading the user. This is what we do. (We spent quite long time inventing and testing various scenarios with delay slots).
We support both big-endian and little-endian code. Usually they look the same but there may be subtle differences in the assembler. The decompiler keeps track of the bits involved and produces human-readable code.
MicroMIPS, as you have probably guessed, is supported too, with its special instructions and quirks.
The MIPS processor contains a number of complex floating point instructions, which perform several operations at once. It is not easy to decipher the meaning of the assembler code but the pseudocode is the simplest possible.
A compiler sometime uses helpers; our decompiler knows the meaning of the many helpers and uses it to simplify code.
In some cases, especially for indirect calls, the decompiler cannot correctly detect call arguments. For a call like
it is very difficult to determine where are the input arguments. For example, it is unclear if ECX is used by the call or not.
However, the number of arguments and their types can become available at later stages of decompilation. For example, the decompiler may determine that ECX points to a class with a table of virtual functions. If the user specifies the vtable layout, the output may become similar to
If the user declares somefunc as a pointer to a function like this:
then the code is incorrect. The decompiler detected only one argument and missed the one in ECX.
The 'force call type' command instructs the decompiler not to perform the call argument analysis but just use the type of the call object. For the above example, the call will be transformed into something like
In other words, this command copies the call type from the call object to the call instruction. The call object may be any expression, the only requirement is that it must be a pointer to a function.
There is a more general command that allows the user to set any type for a call instruction.
NOTE: Behind the scenes the 'force call' command copies the desired type to the operand of the call instruction. To revert the effects of 'force call' or to fine tune the forced type please use the Edit, Operand type, Set operand type in the disassembly view while staying on the call instruction.
See also:
See above the command for more details.
This menu item performs exactly the same actions as the command.
It can also be used to reset other information types used by the decompiler. For example, the or can be reset.
This command generates an HTML file with the pseudocode of the current function. It is available from the popup menu if the mouse is clicked on the very first line of the pseudocode text.
This command also works on the selected area. The user can select the area that will be saved into HTML file. This is useful if only a small code snippet is needed to be saved instead of the entire function body.
See also: interactive operation
This command marks the current function as decompiled. It is a convenient way to track decompiled functions. Feel free to use it any way you want.
Marking a function as decompiled will change its background color to the value specified by the MARK_BGCOLOR parameter in the configuration file. The background color will be used in the pseudocode window, in the disassembly listing, and in the function list.
See also: interactive operation
Hotkey: Ctrl-Shift-R
This command removes the return type from the function prototype. It is applied to the prototype of the current function.
It is available anywhere in the pseudocode window, regardless where exactly the cursor is positioned. This command is not visible in the context sensitive popup menu.
If applied to a function without the return type, it will add the previously removed return type to the function prototype.
This command is available starting from v7.5.
See also: interactive operation, Del function argument.
Hotkeys
Numpad+
Add variadic argument
Numpad-
Delete variadic argument
This command adds or removes an argument of a variadic call. It is impossible to detect the correct number of variadic arguments in all cases, and this command can be used to fix wrongly detected arguments. It is available only when the cursor is located on a call to a variadic function (like printf). The decompiler automatically detects the argument locations, the user can only increase or decrease their number.
This command is useful if the decompiler determines the number of arguments incorrectly. For example:
apparently lacks an argument. Pressing Numpad+ modifies it:
If too many arguments are added to a variadic call, decompilation may fail. Three methods to correct this situation exist:
undo the last action (hotkey Ctrl-Z)
position the cursor on the wrongly modified call and press Numpad-
or use Edit, Other, Reset decompiler information to reset the forced varidic argument counts.
See also: interactive operation
This command opens the standard dialog box with the cross references to the current item. The user may select a cross reference and jump to it. If the cross-reference address belongs to a function, it will be decompiled. Otherwise, IDA will switch to the disassembly view.
For local variables, the following cross reference types are defined:
It is also possible to jump to structure fields. All local references to a field of a structure type will be displayed.
If the item under the cursor is a label, a list of all references to the label will be displayed.
Finally, xrefs to statment types are possible too. For example, a list of all return statements of the current function can be obtained by pressing X on a return statment. All statements with keywords are supported.
See also: interactive operation
Hotkey: Shift-Del
This command removes an argument or the return type from a function prototype. It can be applied to the prototype of the current function as well as to any called function.
It is available only when the cursor is on a function argument or on the return type. As a result of this command, the function prototype is modified: the selected argument is removed from the argument list. If necessary, the calling convention is replaced by a new one.
Please note that other register arguments do not change their locations. This logic ensures that a stray argument in the argument list can be deleted with a keypress.
When applied to the function return type it will convert it to "void".
This command is available starting from v7.5.
See also: interactive operation, Add/delete function return type.
This command decompiles all non-trivial functions in the database and looks for xrefs in them. Library and thunk functions are skipped. The decompilation results are cached in memory, so only the first invocation of this command is slow.
Cross references to the current item are looked up in the decompilation results. A list of such xrefs is formed and displayed on the screen. Currently the following item types are supported:
a structure field
a enumeration member (symbolic constant)
This action is also available (only by hotkey) in the struct view and local types view.
See also: interactive operation
Hotkey: \
This command hides all cast operators from the output listing. Please note that the output may become more difficult to understand or even lose its meaning without cast operators. However, since in some cases it is desirable to temporarily hide them, we provide the end user with this command.
The initial display of cast operators can be configured by the user. Please check the HO_DISPLAY_CASTS bit in the HEXOPTIONS parameter in the configuration file.
See also: interactive operation
This command copies the pseudocode text to the disassembly window. It is available from the popup right-click menu.
Please note that only "meaningful" lines are copied. Lines containing curly braces, else/do keywords will be omitted.
The copied text is represented as anterior comments in the disassembly. Feel free edit them the way you want. The copied text is static and will not change if the pseudocode text changes.
See also: interactive operation
Hotkey: none
This convenience command allows the user to specify a pointer to structure type in a quick and efficient manner. The list of the local structure types will be displayed. The type of the current variable will be set as a pointer to the selected structure type.
This is just a convenience command. Please use the set type command in order to specify arbitrary variable types.
This command is available only when the decompiler is used with recent IDA versions.
See also: interactive operation
Hotkey: none
This convenience command allows the user to convert the current local variable from a non-pointer type to a pointer to a newly created structure type. It is available from the context menu if the current variable is used a pointer in the pseudocode.
The decompiler scans the pseudocode for all references to the variable and tries to deduce the type of the pointed object. Then the deduced type is displayed on the screen and the user may modify it to his taste before accepting it. When the user clicks OK, the new type is created and the type of the variable is set as a pointer to the newly created type.
In simple cases (for example, when the variable is used as a simple character pointer), the decompiler does not display any dialog box but directly changes the variable type. In such cases, no new type will be created.
This is just a convenience command. Please use the set type command in order to specify arbitrary variable types.
This command is available only when the decompiler is used with recent IDA versions.
See also: interactive operation
Hotkey: Shift-S
Sometimes a stack slot is used for two completely different purposes during the lifetime of a function. While for the unaliased part of the stack frame the decompiler can usually sort things out, it cannot do much for the aliased part of the stack frame. For the aliased part, it will create just one variable even if the corresponding stack slot is used for multiple different purposes. It happens so because the decompiler cannot prove that the variable is used for a different purpose, starting from a certain point.
The split variable command is designed to solve exactly this problem.
This command allows the user to force the decompiler to allocate a new variable starting from the current point. If the current expression is a local variable, all its subsequent occurrences will be replaced by a new variable up to the end of the function or the next split variable at the same stack slot. If the cursor does not point to a local variable, the decompiler will ask the user about the variable to replace.
In the current statement, only the write accesses to the variable will be replaced. In the subsequent statements, all occurrences of the variable will be replaced. We need this logic to handle the following situation:
where only the second occurrence of the variable should be replaced. Please note that in some cases it makes sense to click on the beginning of the line with the function call, rather than on the variable itself.
Please note that in the presence of loops in the control flow graph it is possible that even the occurrences before the current expression will be replaced by the new variable. If this is not desired, the user should split the variable somewhere else.
The very first and the very last occurrences of a variable cannot be used to split the variable because it is not useful.
The decompiler does not verify the validity of the new variable. A wrong variable allocation point may render the decompiler output incorrect.
Currently, only aliasable stack variables can be split.
A split variable can be deleted by right clicking on it and selecting 'Unsplit variable'.
See also: interactive operation
In some cases the decompiler cannot produce nice output because the variable allocation fails. It happens because the input contains overlapped variables (or the decompiler mistakenly lumps together memory reads and writes). Overlapped variables are displayed in red so they conspicuously visible. Let us consider some typical situations.
For example, consider the following output: Unfortunately the decompiler cannot handle this case and reports overlapped variables.
The last assignment to v1 reads beyond v1 boundaries. In fact, it also reads v2. See the assembly code:
Arrays cannot be passed to functions by value, so this will lead to a warning. Just get rid of such an array (embed it into a structure type, for example)
The decompiler can handle up to 64 function arguments. It is very unlikely to encounter a function with a bigger number of arguments. If so, just embed some of them into a structure passed by value.
The corrective actions include:
Check the stack variables and fix them if necessary. A wrongly variable can easily lead to a lvar allocation failure.
Define a big structure that covers the entire stack frame or part of it. Such a big variable will essentially turn off variables lumping (if you are familiar with compiler jargon, the decompiler builds a web of lvars during lvar allocation and some web elements become too big, this is why variable allocation fails). Instead, all references will be done using the structure fields.
Check the function argument area of the stack frame and fix any wrong variables. For example, this area should not containt any arrays (arrays cannot be passed by value in C). It is ok to pass structures by value, the decompiler accepts it.
Hotkey: Alt-Y
This command allows the user to select the desired union field. In the presence of unions, the decompiler cannot always detect the correct union field.
The decompiler tries to reuse the union selection information from the disassembly listing. If there is no information in the disassembly listing, the decompiler uses an heuristic rule to choose the most probable union field based on the field types. However, it may easily fail in the presence of multiple union fields with the same type or when there is no information how the union field is used.
If both the above methods of selecting the union field fail, then this command can be used to specify the desired field. It is especially useful for analyzing device drivers ( are represented with a long union), or COM+ code that uses data types.
See also:
This command jumps to the matching parenthesis. It is available only when the cursor is positioned on a parenthesis, bracket, or curly brace.
The default hotkey is '%'.
See also:
This command collapses the selected multiline C statement into one line. It can be applied to if, while, for, switch, do keywords. The collapsed item will be replaced by its keyword and "..."
It can also be applied to the local variable declarations. This can be useful if there are too many variables and they make the output too long. All variable declarations will be replaced by just one line:
See also:
Hotkey: =
This command allows the user to replace all occurrences of a variable by another variable. The decompiler will propose a list of variables that may replace the current variable. The list will include all variables that have exactly the same type as the current variable. Variables that are assigned to/from the current variable will be included too.
Please note that the decompiler does not verify the mapping. A wrong mapping may render the decompiler output incorrect.
The function arguments and the return value cannot be mapped to other variables. However, other variable can be mapped to them.
A mapping can be undone by right clicking on the target variable and using the 'unmap variable' command.
See also:
The decompiler supports the batch mode operation with the text and GUI versions of IDA. All you need is to specify the -Ohexrays switch in the command line. The format of this switch is:
The valid options are:
-new decompile only if output file does not exist
-nosave do not save the database (idb) file after decompilation
-errs send problematic databases to hex-rays.com
-lumina use Lumina server
-mail=my@mail.com your email (meaningful if -errs option is used)
The output file name can be prepended with + to append to it. If the specified file extension is invalid, .c will be used.
The functions to decompile can be specified by their addresses or names. The ALL keyword means all non-library functions. For example:
will decompile all nonlibrary functions to outfile.c. In the case of an error, the .idb file will be sent to hex-rays.com. The -A switch is necessary to avoid the initial dialog boxes.
Hotkey: none
This command resets the type of the current local variable from a pointer type to an integer type. This is just a convenience command. Please use the command in order to specify arbitrary variable types.
See also:
Below is the list of noteworthy public third-party plugins for the decompiler.
by Aleksandr Matrosov and Eugene Rodionov
Hex-Rays Decompiler plugin for better code navigation Here is the features list for first release:
navigation through virtual function calls in Hex-Rays Decompiler window;
automatic type reconstruction for C++ constructor object;
useful interface for working with objects & classes;
A simple list of various IDA and Decompiler plugins
More to come...
Happy analysis!
The decompiler has a configuration file. It is installed into the 'cfg' subdirectory of the IDA installation. The configuration file is named 'hexrays.cfg'. It is a simple text file, which can be edited to your taste. Currently the following keywords are defined:
Background color of local type declarations. Currently this color is not used. Default: default background of the disassembly view
Background color of local variable declarations. It is specified as a hexadecimal number 0xBBGGRR where BB is the blue component, GG is the green component, and RR is the red component. Color -1 means the default background color (usually white). Default: default background of the disassembly view
Background color of the function body. It is specified the same way as VARDECL_BGCOLOR. Default: default background of the disassembly view
Background color of the function if it is . It is specified the same way as VARDECL_BGCOLOR. Default: very light green
Number of spaces to use for block indentations. Default: 2
The position to start indented comments. Default: 48
As soon as the line length approaches this value, the decompiler will try to split it. However, it some cases the line may be longer. Default: 120
In order to keep the expressions relatively simple, the decompiler limits the number of comma operators in an expression. If there are too many of them, the decompiler will add a goto statement and replace the expression with a block statement. For example, instead of
we may end up with:
Default: 8
Specifies the default radix for numeric constants. Possible values: 0, 10, 16. Zero means "decimal for signed, hex for unsigned". Default: 0
Specifies the maximal decompilable function size, in KBs. Only reachable basic blocks are taken into consideration. Default: 64
Combination of various analysis and display options:
If enabled, the decompiler will handle out-of-function jumps by generating a call to the JUMPOUT() function. If disables, such functions will not be decompiled. Default: enabled
If enabled, the decompiler will display cast operators in the output listing. Default: enabled
If enabled, the decompiler will hide unordered floating point comparisons. If this option is turned off, unordered comparisons will be displayed as calls to a helper function: __UNORDERED__(a, b) Default: enabled
If enabled, fast structural analysis will be used. It generates less number of nested if-statements but may occasionally produce some unnecessary gotos. It is much faster on huge functions.
Only print string literals if they reside in read-only memory (e.g. .rodata segment). When off, all strings are printed as literals. You can override decompiler's decision by adding 'const' or 'volatile' to the string variable's type declaration.
Convert signed comparisons of unsigned variables with zero into bit checks. Before:
After:
For signed variables, perform the opposite conversion.
Reverse effects of branch tail optimizations: reduce number of gotos by duplicating code
Keep curly braces for single-statement blocks
Optimize away address comparisons. Example:
will be replaced by 0 or 1. This optimization works only for non-relocatable files.
Print casts from string literals to pointers to char/uchar. For example:
Pressing Esc closes the pseudocode view
Assume all functions spoil flag registers ZF,CF,SF,OF,PF (including functions with explicitly specified spoiled lists)
Keep all indirect memory reads (even with unused results) so as not to lose possible invalid address access
Keep exception related code (e.g. calls to _Unwind_SjLj_Register)
Translate ARMv8.3 Pointer Authentication instructions into intrinsic function calls (otherwise ignore all PAC instructions)
Preserve potential divisions by zero (if not set, all unused divisions will be deleted)
Generate the integer overflow trap call for 'add', 'sub', 'neg' insns
Ignore the division by zero trap generated by the compiler (only for MIPS)
Consider __readflags as depending on cpu flags default: off, because the result is correct but awfully unreadable
Permit decompilation after an internal error (normally the decompiler does not permit new decompilations after an internal error in the current session)
Never use multiline function declarations, even for functions with a long argument list
Decompile library functions too (in batch mode)
Propagate ldx instructions without checking for volatile memory access
Specifies the warning messages that should be displayed after decompilation. Please refer to hexrays.cfg file for the details. Default: all warnings are on
Specified list of function names that are considered "strcmp-like". For them the decompiler will prefer to use comparison against zero like
as a condition. Underscores, j_ prefixes and _NN suffixes will be ignored when comparing function names
Name of Control Flow Guard check function. Calls of this function will not be included into the pseudocode. Default: "guard_check_icall_fptr"
Name of Control Flow Guard dispatch function. Each call of this function will be replaced by 'call rax' instruction when generating pseudocode. Default: "guard_dispatch_icall_fptr"
The current release of the decompiler supports instrinsic functions. The instructions that cannot be directly mapped to high level languages very often can be represented by special functions. All Microsoft and Intel simple instrinsic functions up to SSE4a are supported, with some exceptions. While everything works automatically, the following points are worth noting:
SSE intrinsic functions require IDA v5.6 or higher. Older versions of IDA do not have the necessary functionality and register definitions.
Some intrinsic functions work with XMM constant values (16 bytes long). Modern compiler do not accept 16-byte constants yet but the decompiler may generate them when needed.
Sometimes it is better to represent SSE code using inline assembly rather than with intrinsic functions. If the decompiler detects SSE instructions in the current function, it adds a one more item to the popup menu. This item allows the user to enable or disable SSE intrinsic functions for the whole database. This setting is remembered in the database. It can also be modified in the for new databases.
The decompiler knows about all MMX/XMM built-in . If the current database does not define these types, they are automatically added to the local types as soon as a SSE instruction is decompiled.
Scalar SSE instructions are never converted to intrinsic functions. Instead, they are directly mapped to floating point operations. This usually produces much better output, especially for Mac OS X binaries.
The scalar SSE instructions that cannot be mapped into simple floating point operations (like sqrtss) are mapped into simple functions from .
The decompiler uses intrinsic function names as defined by Microsoft and Intel.
The decompiler does not track the state of the x87 and mmx registers. It is assumed that the compiler generated code correctly handles transitions between x87 and mmx registers.
Some intrinsic functions are not supported because of their prototype. For example, the function is not handled because it requires an array of 4 integers. We assume that most cpuid instructions will be used without any arrays, so adding such an intrinsic function will obscure things rather than to make the code more readable.
Feel free to report all anomalies and problems with intrinsic functions using the command. This will help us to improve the decompiler and make it more robust. Thank you!
See also:
The current release of the x86 decompiler supports floating point instructions. While everything works automatically, the following points are worth noting:
IDA v5.5 or higher is required for floating point support. Earlier versions do not have the required functionality and the decompiler represents fpu instructions using inline assembler statements.
The decompiler knows about all floating point types, including: float, double, long double, and _TBYTE. We introduced _TBYTE because sizeof(long double) is often different from sizeof(tbyte). While the size of long double can be configured (it is implicitly set to a reasonable value when the compiler is set), the size of tbyte is always equal to 10 bytes.
Casts from integers types to floating point types and vice versa are always displayed in the listing, even if the output has the same meaning without them.
The decompiler performs fpu stack analysis, which is similar to the performed by IDA. If it fails, the decompiler represents fpu instructions using inline assembler statements. In this case the decompiler adds one more prefix column to the disassembly listing, next to the stack pointer values. This column shows the calculated state of the fpu stack and may help to determine where exactly the fpu stack tracing went wrong.
The decompiler ignores all manipulations with the floating point control word. In practice this means that it may miss an unusual rounding mode. We will address this issue in the future, as soon as we find a robust method to handle it.
SSE floating point instructions are represented by . Scalar SSE instructions are however directly mapped to floating point operations in pseudocode.
Feel free to report all anomalies and problems with floating point support using the command. This will help us to improve the decompiler and make it more robust. Thank you!
See also:
First of all, read the page. It explains how to deal with most decompilation problems. Below is a mix of other useful information that did not fit into any other page:
more to come...
Sometimes the decompiler can be overly aggressive and optimize references to volatile memory completely away. A typical situation like the following:
can be decompiled into
because the decompiler assumes that a variable cannot change its value by itself and it can prove that r0 continues to point to the same location during the loop.
To prevent such optimization, we need to mark the variable as volatile. Currently the decompiler considers memory to be volatile if it belongs to a segment with one of the following names: IO, IOPORTS, PORTS, VOLATILE. The character case is not important.
Sometimes the decompiler does not optimize the code enough because it assumes that variables may change their values. For example, the following code:
can be decompiled into
but this code is much better:
because
is a pointer that resides in constant memory and will never change its value.
The decompiler considers memory to be constant if one of the following conditions hold:
the segment type is CODE
the segment name is one of the following (the list may change in the future): .text, .rdata, .got, .got.plt, .rodata, __text, __const, __const_coal, __cstring, __cfstring, __literal4, __literal8, __pointers, __nl_symbol_ptr, __la_symbol_ptr, __objc_catlist, __objc_classlist, __objc_classname, __objc_classrefs, __objc_const, __objc_data, __objc_imageinfo, __objc_ivar, __objc_methname, __objc_methtype, __objc_protolist, __objc_protorefs, __objc_selrefs, __objc_superrefs, __message_refs, __cls_refs, __inst_meth, __cat_inst_meth, __cat_cls_meth, __OBJC_RO
The decompiler tries to completely get rid of references to the following segments and replace them by constants: .got, .got.plt, __pointers, __nl_symbol_ptr, __la_symbol_ptr, __objc_ivar, __message_refs, __cls_refs
It is possible to override the constness of an individual item by specifying its type with the volatile or const modifiers.
If there is an assignment like this:
it can be converted into
by simply confirming the types of v1 and v2. NOTE: the variables types must be specified explicitly. Even if the types are displayed as correct, the user should press Y followed by Enter to confirm the variable type.
will have the same effect as in the previous point. Please note that it makes sense to confirm the variable types as explained earlier.
will convert it to:
Please note that it makes sense to confirm the variable types as explained earlier.
Since the arguments of indirect calls are collected before defining variables, specifying the type of the variable that holds the function pointer may not be enough. The user have to specify the function type using other methods in this case. The following methods exist (in the order of preference):
For indirect calls of this form:
If funcptr is initialized statically and points to a valid function, just ensure a correct function prototype. The decompiler will use it.
For indirect calls of this form:
If reg points to a structure with a member that is a function pointer, just convert the operand into a structure offset (hotkey T):
and ensure that the type of mystruct::funcptr is a pointer to a function of the desired type.
Specify the type of the called function using Edit, Operand type, Set operand type. If the first two methods cannot be applied, this is the recommended method. The operand type has the highest priority, it is always used if present.
If the address of the called function is known, use Edit, Plugins, Change the callee address (hotkey Alt-F11). The decompiler will use the type of the specified callee. This method is available only for x86. For other processors adding a code cross reference from the call instruction to the callee will help.
Currently the list is very short but it will grow with time.
The output is excessively short for the input function. Some code which was present in the assembly form is not visible in the output.
This can happen if the decompiler decided that the result of these computations is not used (so-called dead code). The dead code is not included in the output.
One very common case of this is a function that returns the result in an unusual register, e.g. ECX. Please explicitly specify the function type and tell IDA the exact location of the return value. For example:
Read about the user defined calling conventions for more info.
Another quite common case is a function whose type has been guessed incorrectly by IDA or the decompiler. For example, if the guessed type is
but the correct function type is
then all computations of the function arguments will be removed from the output. The remedy is very simple: tell IDA the correct function type and the argument computations will appear in the output.
The following code
is being translated into:
This does not look correct. Can this be fixed?
This happens because the decompiler does not perform the type recovery. To correct the output, modify the definition of CommandLine in IDA. For that, open the stack frame (Edit, Functions, Open stack frame), locate CommandLine and set its type to be an array (Edit, Functions, Set function type). The end result will be:
Old databases do not contain some essential information. If you want to decompile them, first let IDA reanalyze the database (right click on the lower left corner of the main window and select Reanalyze). You will also need to recreate indirect (table) jump instructions, otherwise the switch idioms will not be recognized and decompilation of the functions containing them will fail.
Sure, it can be improved. However, given that many decompilation subproblems are still open, even simple things can take enormous time. Meanwhile we recommend you to use a text editor to modify the pseudocode.
If enabled, the decompiler will generate for SSE instructions that use XMM/MMX registers. If this option is turned off, these instructions will be displayed using inline assembly. Default: enabled
If enabled, the decompiler will produce output even if the local variable allocation has failed. In this case the output may be wrong and will contain some . Default: enabled
the segment has access permissions defined but the write permission is not in the list (to change the segment permissions use the "Edit, Segments, Edit Segment" menu item or the built-in function)
The decompiler knows about the macro and tries to use it in the output. However, in most cases it is impossible to create this macro automatically, because the information about the containing record is not available. The decompiler uses three sources of information to determine if CONTAINING_RECORD should be used:
applied to numbers in the disassembly listing are used as a hint to create CONTAINING_RECORD. For example, applying structure offset to 0x41C in
applied to numbers in the decompiler output. For example, applying _DEVICE_INFO structure offset to -131 in the following code:
In most cases the CONTAING_RECORD macro can be replaced by a shorter and nicer expression if a used. In this case it is enough to declare the pointer as a shifted pointer and the decompiler will transform all expressions where it is used.
In general, if the input information (function types) is incorrect, the output will be incorrect too.
In general, there is no need to file a bugreport if the decompiler gracefully fails. A failure is not necessarily a bug. Please read the section to learn how to proceed.
Please read .
Please read .
The decompiler comes in 9 different flavors:
x86 decompiler (32-bit code)
x64 decompiler (64-bit code)
ARM decompiler (32-bit code)
ARM64 decompiler (64-bit code)
PowerPC decompiler (32-bit code)
PowerPC64 decompiler (64-bit code)
MIPS decompiler (O32 and N32 ABI)
MIPS64 decompiler (N64 ABI)
ARC Decompiler (32-bit code)
Currently the decompiler can handle compiler generated code. Manually crafted code may be decompiled too but the results are usually worse than for compiler code. Support for other processors will eventually be added (no deadlines are available, sorry).
Below are the most important limitations of our decompilers (all processors):
exception handling is not supported
type recovery is not performed
global program analysis is not performed
Limitations specific to x86:
only 32-bit code can be analyzed with ida32
Limitations specific to x64:
only 64-bit code can be analyzed with ida64
Limitations specific to ARM32:
only 32-bit code can be analyzed with ida32
hard-float abi is not supported
Limitations specific to ARM64:
only 64-bit code can be analyzed with ida64
Limitations specific to PPC:
only 32-bit code can be analyzed with ida32
Vector/DFP/VSX/SPE instructions are not supported
Limitations specific to MIPS:
only 32-bit code can be analyzed
only O32 and N32 ABI are supported
only 32-bit FPR in O32 and 64-bit FPR in N32 are supported
Limitations specific to MIPS64:
only 64-bit code can be analyzed
only N64 ABI is supported
only 64-bit FPR are supported
Limitations specific to ARC:
only 32-bit code can be analyzed with ida32
Hands-Free Binary Deobfuscation with gooMBA
At Hex-Rays SA, we are constantly looking for ways to improve the usefulness of our state-of-the-art decompiler solution. We achieve this by monitoring for new trends in anti-reversing technology, keeping up with cutting-edge research, and brainstorming ways to innovate on existing solutions.
Today we are excited to introduce a new Hex-Rays decompiler feature, gooMBA, which should greatly simplify the workflow of reverse-engineers working with obfuscated binaries, especially those using Mixed Boolean-Arithmetic (MBA) expressions. Our solution combines algebraic and program synthesis techniques with heuristics for best-in-class performance, integrates directly into the Hex-Rays decompiler, and provides a bridge to an SMT-solver to prove the correctness of simplifications.
A Mixed Boolean-Arithmetic (MBA) expression combines arithmetic (e.g. addition
and multiplication
) and boolean operations (e.g. bitwise OR
, AND
, XOR
) into a single expression. These expressions are often made extremely complex in order to make it difficult for reverse-engineers to determine their true meaning.
For instance, here is an example of an MBA obfuscation found in a decompilation listing. Note the combination of bitshift
, addition
, subtraction
, multiplication
, XOR
, OR
, and comparison operators
within one expression.
For reference, the above code always returns 0x89
.
MBA is also used as a name for a semantics-preserving obfuscation technique, which replaces simple expressions found in the source program with much more complicated MBA expressions. MBA obfuscation is called semantics-preserving since it only changes the syntax of the expression, not the underlying semantics — the input/output behavior of the expression should remain the same before and after.
A decompiler can be thought of as a massive simplification engine — it reduces the mental load of the reverse engineer by transforming a complex binary program into a vastly simplified higher-level readable format. It partially achieves this through equivalences, special pattern-matching rules derived from mathematical properties such as the commutativity, distributivity, and identity. For instance, the following simplification can be performed by applying the distributive property and identity property.
Both boolean functions and arithmetic functions on integers are very well studied, and there is an abundance of simplification techniques and algorithms developed for each. MBA obfuscators exploit the fact that many of these equivalences and techniques break down when the two function types are combined. For instance, we all know that integer multiplication distributes over addition, but note that the same does not hold over the bitwise XOR
:
Advanced Computer Algebra Systems (CAS) such as Sage and Mathematica allow users to simplify arithmetic expressions, but their algorithms break down when we start introducing bitwise operations into our inputs.
Furthermore, although Satisfiability Modulo Theories (SMT) solvers such as z3 do often support both arithmetic and boolean operations on computer integers, they do not perform simplification — at least not for any human definition of "simplicity." Rather, their only goal is to prove or disprove the input formula; as a result, they are useful in proving a simplification correct, but not in deriving the simplification to begin with.
The core idea behind MBA obfuscation is that a complex, but semantically equivalent, MBA expression can be substituted for simpler expressions in the source program. For instance, one technique that can be used for MBA generation is the repeated application of simple MBA identities, such as:
Many of these identities are available in the classic book Hacker’s Delight, but there are an effectively unbounded number of them. For instance, Reichenwallner et al. easily generated 1,000 distinct MBA substitutions for x+y
alone.
There are also many more sophisticated techniques that can be used for MBA generation, such as applying invertible functions and point functions. The number of invertible functions in computer integers is similarly unbounded. By simply choosing and applying any invertible function followed by its inverse, then applying rewriting rules to mix up the order of operations, an MBA generator can create extremely complex expressions effortlessly.
Besides the obvious effect of making decompilation listings longer and more complex for humans to understand, there are a few other effects which this form of obfuscation can have on the binary analysis process.
For instance, dataflow/taint analysis is a static analysis technique that can be used to automatically search for potentially exploitable parts of a program (such as an unsanitized dataflow from untrusted user input into a SQL query). MBA obfuscation can be used to complicate dataflow analysis, by introducing arbitrary unrelated variables into the MBA expression without modifying its semantics. It then becomes extremely difficult to deduce whether or not the newly introduced variable has an effect on the expression’s final value.
An extreme example of this false dataflow technique is known as opaque predicates, whose original expressions have no semantic data inflows (i.e. they are constant). In other words, they always evaluate to a constant, regardless of their (potentially many) inputs. These opaque predicates can then be used for branching, creating false connections in the control-flow graph in addition to the dataflow graph.
Over the years, many algorithms have been developed to simplify MBA expressions. These include pattern matching, algebraic methods, program synthesis, and machine learning methods.
Since one of the core techniques involved in MBA generation is the application of rewrite rules, it seems natural to simply match and apply the same rewrite rules in the reverse direction. Indeed, this is precisely what earlier tools such as SSPAM did.
There are several issues with pattern matching methods. Firstly, there are a massive number of possible rewrite rules, and proprietary binary obfuscators are unlikely to reveal what rules they use. In addition, at any given moment an expression might contain multiple subexpressions that each match a pattern, and the order in which we perform these simplifications matters! Performing one simplification might prevent a more optimal simplification from appearing down the line. If we were to attempt every possible ordering of optimizations, our search space quickly becomes exponential. As a result, we considered pure pattern-matching methods to be infeasible for our purposes of simplifying complex MBA expressions.
Arybo is an example of an MBA simplifier that relies entirely on algebraic methods. It splits both inputs and outputs into their individual bits and simplifies each bit of the output individually. It’s clear that this method comes with some limitations. For a 64-bit expression, the program outputs 64 individual boolean functions, and it then becomes quite difficult for a human to combine these functions back into a single simplified expression. Notably, the built-in z3 bitvector simplifier also outputs a vector of boolean functions, since this representation is more useful for its main goal of proving whether or not a statement holds.
Other algebraic algorithms for solving MBA expressions which do not split the expression into individual bits also exist. For instance, MBA-Blast and MBA-Solver use a transformation between n-bit MBA expressions and 1-bit boolean expressions. For linear MBAs (which we will describe in more detail later), this transformation is well-behaved, and a lookup table can trivially be used to simplify the corresponding boolean expression.
SiMBA, another algorithm published by Denuvo researchers in 2022, uses a similar approach to MBA-Blast and MBA-Solver, but additionally makes the observation that the transformation to 1-bit boolean expressions is not necessary for correctness; rather, the authors prove that it is sufficient to simply limit the domains of all input variables to 0/1. As a result, their algorithm yields much better performance; however, it’s important to note that the algorithm still relies on the algebraic structure of linear MBA expressions, and as a result will not work on all MBA expressions found in the wild.
Program synthesis is the act of generating programs that provably fulfill some useful criteria. In the case of MBA-deobfuscation, our task is to generate simpler programs that are provably semantically equivalent to the provided obfuscated program. In short, two programs are considered semantically equivalent if they yield identical side effects and identical outputs on every possible set of inputs. For the MBA expressions we consider, the expressions have no side effects or branching, so we are just left with the requirement that the simplified expression must yield the same output for every possible set of inputs.
One core observation made by synthesis-based tools such as Syntia, QSynthesis, and msynth is that for many real-world programs, the underlying semantics are relatively simple. After all, it is much more common to calculate the sum of two numbers x+y
, than the result of say, 4529*(x>>(y^(11-~x)))
. Thus, for the most part, we only need to consider synthesizing relatively simple programs. To be clear, this is still a massive number of programs, but it at least makes the problem tractable.
The main technique used by QSynth and msynth is an offline enumerate synthesis primitive guided by top-down breadth-first search. In simpler terms, these tools take advantage of precomputation, generating and storing a massive database of candidate expressions known as an oracle, searchable by their input/output behavior. Then, when asked to simplify a new expression, they analyze its input/output behavior and use it to perform a lookup in the oracle.
Essentially, the input/output behavior of any expression is summarized by running the candidate expression with various inputs (some random, some specially chosen like 0
or 0xffffffff
), collecting the resulting outputs, and hashing them into a single number. We refer to this number as a fingerprint, and the oracle can be thought of as a multimap from fingerprints to expressions. The simplification is then performed by calculating the fingerprint of the expression to be simplified, then looking up the fingerprint in the oracle for simpler equivalent expressions.
Tools such as Syntia and NeuReduce use machine learning and reinforcement learning techniques to search for semantically equivalent expressions on the spot. However, we found that Syntia’s success rate was quite low — only around 15% on linear MBA expressions, and NeuReduce appeared to only have been evaluated on linear MBA expressions (on which it reported a 75% success rate), which are already solvable 100% of the time through algebraic approaches such as MBA-Blast and SiMBA.
When designing gooMBA, we had the following goals in mind:
Correctness — Obviously, a tool that outputs nonsense is useless, so we should strive to generate correct simplifications whenever feasible. When a true proof of correctness is infeasible, the tool should try to verify the results to a reasonable degree of certainty.
Speed — The Hex-Rays decompiler is well-known in the industry for its speed. Likewise, the tool should strive for the highest performance possible. However, we are obviously willing to sacrifice a couple of seconds in machine-computation time if it means saving a human analyst hours of work.
Integration — The decompiler plugin should be able to optionally disappear into the background. Ideally, the user should be able to forget that they are even analyzing an obfuscated program and focus only on the work at hand.
Since there is no single way to generate MBA expressions, we decided to incorporate multiple deobfuscation algorithms into our final design and leave room for more in the future. Our tool, gooMBA, can be split into the following parts: microcode tree walking, simplification engine, SMT proofs of correctness, and heuristics.
Below is a drawing of our overall approach:
Since we found the SMT stage to be the most time-consuming, we run several hundred random test cases on candidate simplifications before attempting a proof.
Before we can attempt simplification, we must first find potential MBA-obfuscated expressions in the binary. The Hex-Rays decompiler converts binaries into an intermediate form known as microcode, and continuously propagates variable values downward until a certain complexity limit is reached. Since MBA-expressions can be extremely complex (but notably, not so complex that they hinder performance), we increase the complexity limit when the MBA deobfuscator is invoked in order to maximize the complexity of expressions we can analyze. We then perform a simple tree-search through all expressions found in the program, starting with the most complex top-level expressions, and falling through to simpler subexpressions if they fail to simplify.
Our MBA simplification engine is split into three parts, each handling a subset of MBA expressions. We refer to these three parts as the Simple Linear Algorithm, Advanced Linear Algorithm, and the Synthesis Oracle Lookup.
We can think of each one of these three parts as a self-contained module: the obfuscated expression goes in one end, and a set of candidate expressions (each simpler than the obfuscated expression) comes out of the other end. At this stage, these expressions are simply guesses, and may or may not be correct.
One important thing to note is that all three of our subengines are considered black-box, i.e. they do not care about the syntactic characteristics of the expression being simplified, only its semantic properties — i.e. how the outputs change depending on the input values.
One of the fastest and easiest types of expressions we can simplify are those that reduce to a linear equation, i.e.
Note that constants fall under this category as well. We can simplify these easily by emulating the expression we are trying to simplify, first using zeroes for every input variable. This would tell us the value of a0
. We can then emulate the instruction once again, this time using zeroes for every input variable except x1
. Combined with the previously found value, this tells us the value of a1
. We can repeat the process until we’ve obtained all the necessary coefficients. Note that the algorithm can also efficiently detect when a variable needs to be zero- or sign- extended
; we can simply try the value -1
for each variable and see which of the zero- or sign-extended
versions of the linear equation matches the output value. It can be shown in this case that both checks will succeed if and only if both sign- and zero-extension are semantically acceptable.
Reichenwallner et al. showed that there is also a fast algorithm, namely SiMBA, to simplify linear MBA expressions, defined as those which can be written as a linear combination of bitwise expressions, i.e.
Where each ei(x1,...,xn)
is a bitwise expression. For instance, 2*(x&y)
is a linear MBA expression, but neither (x & 0x7)
nor (x>>3)
are linear MBA expressions, since neither (x & 0x7)
nor (x >> 3)
are bitwise or can be written as the linear combination of bitwise expressions.
Essentially, the algorithm works by deriving an equivalent representation consisting of linear combinations of only bitwise conjunctions, e.g. 4 + 2*x + 3*x + 5*(x&y)
. Without going into too much detail, we can recall that every boolean function has a single canonical full DNF
form (i.e. it can be written as an OR
of ANDs
formula), which can then be easily translated into a linear combination of conjunctions. Therefore, every linear MBA expression can be written as a linear combination of conjunctions by simply applying the aforementioned transformation to each individual bitwise function, then combining terms.
Now, this linear combination of ANDs
can be easily solved using a similar technique we described in the previous section, with the difference being that we must evaluate every possible combination of 0/1
value inputs, not just the inputs containing zero or one 1-values
. Without going into too much detail, the coefficients can then be solved through a system of 2n linear equations of 2n variables, where each variable in the linear system represents one of the conjunctions of original variables, and each equation represents a possible 0/1
assignment to the original variables. We improve upon the algorithm proposed by Reichenwallner et al. by making further observations on the structure of the coefficients in the system and applying the forward substitution technique, yielding a simpler and faster solver.
Finally, Reichenwallner et al. apply an 8-step refinement procedure to find simpler representations, involving more bitwise operations than just conjunction. We found this refinement procedure reasonable and only applied a few tweaks in our implementation.
The algebraic engines are great for deriving constants when the expression’s semantics fulfill a certain structural quality, namely that they are equivalent to a linear combination of bitwise functions. However, we found that non-linear MBAs are also common in real-world binaries. In order to handle these cases, it is necessary to implement a more general algorithm that does not rely on algebraic properties of the input expression.
QSynth (2020, David, et al.) and later msynth (2021, Blazytko, et al.) both rely on a precomputed oracle which contains an indexed list of expressions generated through an enumerative search procedure. These expressions are searchable by what we refer to as fingerprints, which can intuitively be understood as a numeric representation of a function’s I/O behavior
.
In order to generate a function fingerprint, we begin by generating test cases, which are assignments of possible inputs to the function. For instance, if we had three variables, a possible test case would be (x=100, y=0, z=-1)
. Then, we feed each one of these test cases into the function being analyzed; for instance, the expression "x - y + z"
would yield the output value 99
for the previous test case. Finally, we collect all the outputs and hash them into a single number to get the fingerprint. Now we can look up the fingerprint in the oracle and find a function that is possibly semantically equivalent to the analyzed function.
Note that two functions that are indeed semantically equivalent will always yield the same fingerprints (since they will give the same outputs on the test cases). Therefore, if our oracle is exhaustive enough, it should be possible to find equivalences for many MBA-obfuscated expressions. A large precomputed oracle which can be used with goomba is available here: https://hex-rays.com/products/ida/support/freefiles/goomba-oracle.7z
In order to have full confidence in the correctness of our simplifications, we feed both the simplified and original expressions into a satisfiability modulo theories (SMT) solver. Without going into too much detail, we translate IDA’s internal intermediate representation into the SMT language, then confirm that there is no value assignment that causes the two expressions to differ. (In other words, a != b is UNSAT
.) If the proof succeeds, then we have full faith that the substitution can be performed without changing the semantics of the decompilation. We use the z3 theorem prover provided by Microsoft Research for this purpose.
We found that invoking the SMT solver leads to unreliable performance, since the solver often times out or takes an unreasonable amount of time to prove equivalences. In order to avoid invoking the solver too often, we use heuristics at various points in our analysis. For instance, we detect whether an expression appears to be an MBA expression before trying to simplify it. In addition, every time before we invoke the SMT solver, we generate random test cases and emulate both the input and simplified expressions to ensure they return the same values. We found the latter heuristic to improve performance up to 1,000x in many cases.
We evaluated gooMBA on the dataset of linear MBA-obfuscated expressions on MBA-Solver’s GitHub repository, an assortment of real-world examples from VirusTotal that appeared to be MBA-obfuscated, and an MBA-obfuscated sample object file from Apple’s FairPlay DRM solution. In terms of correctness, we find what we expect — gooMBA, being a combination of multiple algorithms, is able to cover more cases than each algorithm individually.
In terms of performance, we find that gooMBA competes very favorably against state-of-the-art linear MBA solvers, and is able to simplify all of the 1,000 or so examples from MBA-Solver much faster than SiMBA. Note that the comparison is not strictly fair, since SiMBA accepts input expressions as a string, and gooMBA accepts them as decompilation IR; regardless, we claim that accepting decompilation IR leads to a superior user experience with less possibility for human error.
Compared to msynth, the difference is even more dramatic. On the mba_challenge file provided on msynth’s GitHub repo, we measured the runtime to take around 1.87s per expression. In contrast, our equivalent algorithm took just 0.0047s to run, with the z3 proof taking 0.1s.
We have presented gooMBA, a deobfuscator that integrates directly into the Hex-Rays decompiler in IDA Pro. This is a meaningful usability trait, since competing tools are typically standalone and require inputting the expression manually or interpreting obtuse outputs. However, this feature also presents some difficulties. For instance, we do not yet perform any use-def analysis or variable propagation beyond what’s already performed by the decompiler. The plugin also currently operates in a purely non-interactive manner, and we believe that adding some interactivity (e.g. allowing the user to choose from a list of simplifications, runs proofs in the background, etc.) would greatly benefit usability.
Some potential areas of improvement for gooMBA are: sign extensions are not handled uniformly across all simplification strategies, point function analysis is limited, the simplification oracle is limited by necessity, and use-def analysis can be strengthened to extract expressions spread across basic blocks.
Finally, it’s important to note that MBA obfuscation and deobfuscation are constantly evolving. We based our algorithm choices and implementations on the most promising research on the cutting-edge, but acknowledge that more effective solutions may appear in the future. For instance, though we found that machine learning techniques for MBA-solving have historically underperformed competing methods, machine learning seems like a good candidate for NP-hard problems such as MBA simplification, and we are watching this space for new solutions.
Blazytko, Tim, et al. "Syntia: Synthesizing the semantics of obfuscated code." 26th USENIX Security Symposium (USENIX Security 17). 2017.
Blazytko, Tim, et al. "msynth." https://github.com/mrphrazer/msynth. 2021.
David, Robin, Luigi Coniglio, and Mariano Ceccato. "Qsynth-a program synthesis based approach for binary code deobfuscation." BAR 2020 Workshop. 2020.
Feng, Weijie, et al. "Neureduce: Reducing mixed boolean-arithmetic expressions by recurrent neural network." Findings of the Association for Computational Linguistics: EMNLP 2020. 2020.
Liu, Binbin, et al. "MBA-Blast: Unveiling and Simplifying Mixed Boolean-Arithmetic Obfuscation." 30th USENIX Security Symposium (USENIX Security 21). 2021.
Quarkslab. "SSPAM: Symbolic Simplification with PAttern Matching." https://github.com/quarkslab/sspam. 2016.
Quarkslab. "Arybo." https://github.com/quarkslab/arybo. 2016.
Reichenwallner, Benjamin, and Peter Meerwald-Stadler. "Efficient Deobfuscation of Linear Mixed Boolean-Arithmetic Expressions." Proceedings of the 2022 ACM Workshop on Research on offensive and defensive techniques in the context of Man At The End (MATE) attacks. 2022.
Xu, Dongpeng, et al. "Boosting SMT solver performance on mixed-bitwise-arithmetic expressions." Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 2021.
The following failure categories exist:
a crash or access violation
incorrect output text
inefficient/unclear/suboptimal output text
The current focus is on producing a correct output for any correct function. The decompiler should not crash, fail, or produce incorrect output for a valid input. Please file a bugreport if this happens.
The decompiler has an extensive set of internal checks and assertions. For example, it does not produce code which dereferences a "void*" pointer. On the other hand, the produced code is not supposed to be compilable and many compilers will complain about it. This is a deliberate choice of not making the output 100% compilable because the goal is not to recompile the code but to let humans analyze it faster.
The decompiler uses some C++ constructs in the output text. Their use is restricted to constructs which cannot be represented in C (the most notable example is passing structures to functions by value).
When the decompiler detects an internal inconsistency, it displays a message box with the error code. It also proposes you to send the database to the hex-rays.com server:
It is really difficult (almost impossible) to reproduce bugs without a sample database, so please send it to the server. To facilitate things, the decompiler saves its internal state to the database, which is really handy if the error occurs after hours and hours of decompilation.
It is impossible to decompile anything after an internal error. Please reload the database, or better, restart IDA.
When the decompiler gracefully fails on a function, it will display one of the following messages. In general, there is no need to file a bugreport about a failure except if you see that the error message should not be displayed.
Please read the Troubleshooting section about the possible actions.
This error means that the decompiler could not translate an instruction at the specified address into microcode. Please check the instruction and its length. If it looks like a regular instruction used in the compiler generated code and its length is correct, file a bugreport.
The error message is self-explanatory. While it should not happen very often, it still can be seen on functions with huge stacks. No need to report this bug. Hopefully the next version will handle functions with huge stack more efficiently.
Please restart IDA after this error message.
This error means that at the specified address there is a basic block, which does not end properly. For example, it jumps out of the function, ends with a non-instruction, or simply contains garbage. If you can, try to correct the situation by modifying the function boundaries, creating instructions, or playing with function tails. Usually this error happens with malformed functions.
If the error happens because of a call, which does not return, marking the called function as "noret" will help. If the call is indirect, adding a cross reference to a "noret" function will help too.
If this error occurs on a database created by an old version of IDA, try to reanalyze the program before decompiling it. In general, it is better to use the latest version of IDA to create the databases for decompilation.
Unrecognized table jumps may lead to this failure too.
The stack pointer at the specified address is higher than the initial stack pointer. Functions behaving so strangely cannot be decompiled. If you see that the stack pointer values are incorrect, modify them with the Alt-K (Edit, Functions, Change stack pointer) command in IDA.
Analysis of the function prolog has failed. Currently there is not much you can do but you will not see this error very often. The decompiler will try to produce code with prolog instructions rather than stopping because of this failure.
The switch idiom (an indirect jump) at the specified address could not be analyzed. You may specify the switch idiom manually using Edit, Other, Specify switch idiom.
If this error occurs on a database created by an old version of IDA, try to delete the offending instruction and recreate it. Doing so will reanalyze it and might fix the error because newer versions of IDA handle switches much better than older versions.
This error message should not occur because the current version will happily decompile any function and just ignore any exception handlers and related code.
Since the stack analysis requires lots of memory, the decompiler will refuse to handle any function with the unaliased stack bigger than 1 MB.
This error message means that the decompiler could not allocate local variables with the registers and stack locations. You will see this error message only if you have enabled HO_IGNORE_OVERLAPS in the configuration file. If overlapped variables are allowed in the output, they are displayed in red.
Please check the prototypes of all involved functions, including the current one. Variables types and definitions may cause this error too.
Updating the function stack frame and creating correct stack variables too may help solve the problem.
If you got this error after some manipulations with the function type or variable types, you may reset the information about the current function (Edit, Other, Reset decompiler information) and start afresh.
The message text says it all. While the decompiler itself can be fine tuned to decompile 16-bit code, this is not a priority.
This is the most painful error message but it is also something you can do something about. In short, this message means that the decompiler could not determine the calling convention and the call parameters. If this is a direct non-variadic call, you can fix it by specifying the callee type: just jump to the callee and hit Y to specify the type. For variadic functions too it is a good idea to specify the type, but the call analysis could still fail because the decompiler has to find out the actual number of arguments in the call. We would recommend to start by checking the stack pointer in the whole function. Get rid of any incorrect stack pointer values. Second, check the types of all called functions. If the type of a called function is wrong, it can interfere with other calls and lead to a failure. Here is a small example:
If f1 is defined as a __stdcall function of 3 arguments, and f2 is a function of 1 argument, the call analysis will fail because we need in total 4 arguments and only 3 arguments are pushed onto the stack.
If the error occurs on an indirect call, please specify the operand type of the call instruction. Also, adding an xref to a function of the desired type from the call instruction will work. The decompiler will use the type of the referenced function.
If all input types are correct and the stack pointer values are correct but the decompiler still fails, please file a bugreport.
This is a rare error message. It means that something is wrong with the function stack frame. The most probable cause is that the return address area is missing in the frame or the function farness (far/near) does not match it.
This error can occur if a reference to a named type (a typedef) is made but the type is undefined. The most common case is when a type library (like vc6win.til) is unloaded. This may invalidate all references to all types defined in it.
This error also occurs when a type definition is illegal or incorrect. To fix an undefined ordinal type, open the local types windows (Shift-F1) and redefine the missing type.
Currently this error means that the function chunk information is incorrect. Try to redefine (delete and recreate) the function.
Some basic type sizes are incorrect. The decompiler requires that
sizeof(int) == 4
sizeof(enum) == 4
Please check the type sizes in the Options, Compiler dialog box and modify them if they are incorrect.
Also ensure that the correct memory model is selected: "near data, near code".
Finally, the pointer size must be:
for 32-bit applications use "near 32bit, far 48bit"
for 64-bit applications use "64bit".
This is an internal error code and should not be visible to the end user. If it still gets displayed, please file a bugreport.
The decompiler failed to trace the FPU stack pointer. Please check the called function types, this is the only thing available for the moment. We will introduce workarounds and corrective commands in the future. For more information about floating point support, please follow this link.
Please file a bugreport, normally this error message should not be displayed.
This is a variant of the variable allocation failure error. You will see this error message only if you have enabled HO_IGNORE_OVERLAPS in the configuration file. If overlapped variables are allowed in the output, they are displayed in red.
A partially initialized variable has been detected. Wrong stack trace can induce this error, please check the stack pointer.
The function is too big or too complex. Unfortunately there is nothing the user can do to avoid this error.
IDA could not locate your decompiler license.
This error message will not currently be displayed.
IDA64 can currently decompile only 64-bit functions. To decompile 32-bit functions please use IDA32.
An attempt to decompile a function while decompiling another function has been detected. Currently only one function can be decompiled at once.
Please check the data and code memory models in the Options, Compiler dialog. If necessary, reset them to 'near' models.
The current function belongs to a special segment (e.g. "extern" segment). Such segments do not contain any real code, they contain just pointers to imported functions. The function body is located in some other dynamic library. Therefore, there is nothing that we could decompile.
The current function is bigger than the maximal permitted size. The maximal permitted size is specified by the MAX_FUNCSIZE configuration parameter.
The specified input ranges are wrong. The range vector cannot be empty. The first entry must point to an instruction. Ranges may not overlap. Ranges may not start or end in the middle of an item.
The current processor bitness, endianness, or ABI settings in the compiler options are not acceptable. See the current ABI limitations here.
Branches and jumps are not allowed in a delay slot. Such instructions signal an exception and cannot be decompiled.
When the decompiler fails, please check the following things:
the function boundaries. There should not be any wild branches jumping out of function to nowhere. The function should end properly, with a return instruction or a jump to the beginning of another function. If it ends after a call to a non-returning function, the callee must be marked as a non-returning function.
the stack pointer values. Use the Options, General, Stack pointer command to display them in a column just after the addresses in the disassembly view. If the stack pointer value is incorrect at any location of the function, the decompilation may fail. To correct the stack pointer values, use the Edit, Functions, Change stack pointer command.
the stack variables. Open the stack frame window with the Edit, Functions, Stack variables... command and verify that the definitions make sense. In some cases creating a big array or a structure variable may help.
the function type. The calling convention, the numbers and the types of the arguments must be correct. If the function type is not specified, the decompiler will try to deduce it. In some rare cases, it will fail. If the function expects its input in non-standard registers or returns the result in a non-standard register, you will have to inform the decompiler about it. Currently it makes a good guess about the non-standard input locations but cannot handle non-standard return locations.
the types of the called functions and referenced data items. A wrong type can wreak havoc very easily. Use the F hotkey to display the type of the current item in the message window. For functions, position the cursor on the beginning and hit F. If the type is incorrect, modify it with Edit, Functions, Set function type (the hotkey is Y). This command works not only for functions but also for data and structure members.
If a type refers to an undefined type, the decompilation might fail.
use a database created by the latest version of IDA.
In some cases the output may contain variables in red. It means that local variable allocation has failed. Please read the page about overlapped variables for the possible corrective methods.
The future versions will have more corrective commands but we have to understand what commands we need.
To be useful, the bugreport must contain enough information to reproduce the bug. The send database command is the preferred way of sending bugreports because it saves all relevant information to the database. Some bugs are impossible to reproduce without this command.
The database is sent in the compressed form to save the bandwidth. An SSL connection is used for the transfer.
If your database/input file is confidential and you cannot send it, try to find a similar file to illustrate the problem. Thank you.
We handle your databases confidentially (as always in the past).