IDA 8.0 introduces IDA Teams - a mechanism that provides revision control for your IDA database files. Perhaps the most essential feature of this new product is the ability to natively diff and merge databases using IDA, allowing multiple reverse engineers to manage work on the same IDA database.
This document discusses in detail the steps involved when diffing and merging IDA databases.
Before continuing, you might want to take a quick look at the tutorial for HVUI, the GUI client for IDA Teams' revision control functionality. It will be referenced multiple times in this document, although here we will focus specifically on the merging functionality.
After having done some reverse-engineering work on an IDA database, it is possible to view those changes in a special mode in IDA: right-click, and choose the diff action:
Here a new instance of IDA will be launched in a special "diff" mode:
This new IDA mode lets the user compare two databases, in a traditional "diff" fashion: essentially a two-panel window, showing the unmodified file on the left and the version with your changes on the right.
Represents the current step in the diff process.
Shows the "untouched" version of the database (i.e., the one without your changes)
Shows your version of the database (i.e., featuring your changes)
Notice how both panels have a little area at the bottom, that is labeled "Details".
Details are available on certain steps of the diffing process, and provide additional information about the change that is currently displayed.
The actions in the toolbar are:
Using actions in the toolbar, you can now iterate through the differences between the two databases, with each change shown in context as if viewed through a normal IDA window.
The ability to view changes in context was a major factor in the decision to use IDA itself as the diffing/merging tool for IDA Teams.
Move to the previous change
Re-center the panels to show the current chunk (useful if you navigated around to get more context)
Move to the next change
Move to the next step in the diffing process.
Toggle the visibility of the "Details" widgets in the various panels (note that some steps do not provide details, so even if the "Details" are requested, they might not be currently visible.)
It is important to note the difference between the terms "diff" and "merge".
This document will sometimes use the two terms interchangeably. This is because to IDA, a diff is just a specialized merge. Both diffing and merging are handled by IDA’s "merge mode", which involves up to 3 databases, one of which can be modified to contain the result of the merge.
A diff is simply a merge operation that involves only 2 databases, neither of which are modified.
This is why often times you will see the term "merge" used in the context of a diff. In this case "merge" is referring to IDA’s "merge mode", rather than the process of merging multiple databases together into a combined database.
We must stress the fact that performing a merge between two IDA databases is quite different than performing a merge between, say, two text files. A change in a chunk of text file will not have an impact over another chunk.
IDA databases are not so simple. A change in one place in an idb will often have an impact on another place. For example, if a structure mystruct
changed between two databases, it will have an impact not only on the name of the structure, but on cross-references to structure members, function prototypes, etc.
This is why IDA’s merge mode is split into a strict series of "steps":
Within a single step it is possible to go forward & backward between different chunks. But because of possible inter-dependencies between steps, it is not possible to move backwards between steps, you can only go forward:
Since IDA’s diff mode is just a variation of its merge mode, diffing databases is also subject to this sequential application of steps in order to view certain bits of information. That is why, in some steps (e.g., the "Disassembly/Items") IDA might not report some changes that were performed at another level.
For instance, if a user marked a function as noret
, the listings that will be shown in "Disassembly/Items" step, will not advertise that there was a change at that place (even though the "Attributes: noreturn"
is visible in the left-hand listing), only the changes to the instructions (and data, …) are visible in the current step:
The change will, however, be visible at a later step (i.e., "Functions/Registry"):
The changes applied during the "diff" process are only temporary. Exiting IDA (at any moment) will not alter the files being compared.
As with any collaborative tool, it may happen that two coworkers work on the same dataset (e.g., IDA database), and make modifications to the same areas, resulting in "conflicts". Conflicts must be "resolved" prior to committing.
To do that, right-click and pick one of the "resolve" options:
IDA Teams provides the following merge strategies.
If the option that was chosen (e.g., Interactive merge mode) requires user interaction due to conflicts, IDA will show in 3-pane "merge" mode.
When a conflict is encountered, you’ll have the ability to pick, for all conflicts, which change should be kept (yours, or the other). Every time you pick a change (and thus resolve a conflict), IDA will proceed with the merging, applying all the non-conflicting changes it can, until the next conflict - if any. When all conflicts are resolved, you can leave IDA, and the new resulting file is ready to be submitted.
This section provides a detailed overview of the steps involved in the merge process. The list of predefined merge steps is defined in merge.hpp
of the IDASDK:
The list of merge steps is not final. If for example there is a conflict in structure members then the new merge phase to resolve this conflict will be created. The same is hold for UDT, functions, frames and so on. In other words in general case the exact number of merge steps is undefined and depends on the databases.
Each item in a merge step is assigned to a difference position named diffpos
. It may be an EA (effective address), enum id, structure member offset, artificial index and so on. In other words, a diffpos
is a way of addressing something in the database.
Every merge step starts with the calculation of differences and conflicts between items at the corresponding difference positions. As the result there is a list of diffpos
with differences or conflicts. The diffpos`s without differences are not included in the list. Adjacent `diffpos`s are combined into a difference range called `diffrange
.
The merging process operates on a difference range diffrange
. For one diffrange
, a single merge policy can be selected.
Merging of global database attributes. These attributes are mainly stored in the idainfo
structure. This phase has two subphases:
Global settings/Database attributes/Graph mode
Global settings/Database attributes/Text mode
The "Detail" pane is absent.
Merging of global processor options. Usually these options are stored in the idpflags
netnode.
The "Detail" pane is absent.
Merging of registered string literal encodings. These encodings are used to properly display string literal in the disassembly listing.
The "Detail" pane is absent.
Merging of default string encodings: what string encoding among the registered ones are considered as the default ones.
The "Detail" pane is absent.
Merging of embedded script snippets.
When merging of embedded script snippets, the script name/language is displayed, and the "Detail" pane contains the script source with the highlighted differences:
Merging of the default snippet and tabulation size.
The "Detail" pane is absent.
Merging of the registered custom data types and formats.
The "Detail" pane is absent.
Merging of assembler level enums (enum_t
). Ghost enums are skipped in this phase, they will be merged when handling local types.
To calculate diffpos
, IDA Teams matches enum members by name and maps all enums with common member names into one diffpos
.
An example of enum merging:
In both idbs, enum constant "B" is present. However, in the remote idb "B" has a different parent enum, "enum_2". Therefore enum_1 in the local idb corresponds to enum_1 and enum_2 in the remote idb. The user can select either enum_1 from the local idb or enum_1 and enum_2 from the remote idb.
In other words, IDA will display both enum_1 and enum_2 in the Remote pane, indicating that the difference between the Local and Remote databases corresponds to two separate enums, but they are treated as a single difference location. The "Detail" pane will display the full enum definitions, with the differences highlighted:
Merging of assembler level structures (struc_t
).
To calculate diffpos
, IDA Teams matches structs by the following attributes, in this order:
the structure name
the structure tid
and size
If we fail to match a structure, then it will stay unmatched. Such an unmatched structure will have it own diffpos
, allowing the user to copy it to the other idb or to delete it altogether.
This merge phase deals with the entire structure types and their attributes. Entire structure types may be added or deleted, and/or conflicts in the structure attributes are resolved.
If members of matched structures (at the same diffpos
) differ, the conflict will be resolved later, during the Types/Struct members/… merge phase.
In the UI, IDA will display the list of structure names, with the "Detail" pane showing the structure attributes:
Merging of the loaded type libraries.
This merge phase uses the standard "Type libraries" widget.
The "Detail" pane is absent.
Merging of local types.
To calculate diffpos
, IDA Teams matches local types by the following attributes, in this order:
the type name
the ordinal number and base type
If we fail to match a type, then it will stay unmatched. Such an unmatched type will have it own diffpos
, allowing the user to copy it to the other idb or to delete it altogether.
This merge phase deals with entire types and their attributes. Entire local types may be added or deleted, and/or conflicts in their attributes are resolved. Differences in type members (e.g., struct members) will be resolved in a separate phase: Types/Local type members
This merge phase uses the standard "Local types" widget. The "Detail" pane displays the type definition and its attributes.
For example:
Types/Struct members/struct_t
Types/Local type members/struct conflict_t
These merge phases merges the conflicting members of a structure or a local type.
The "Detail" pane displays full information about the current member along with its attributes.
Ghost structs may have comments attached to them.
This merge phase handles these comments:
We need a separate phase for these comments in order not to lose them during merging because by default ghost types are considered secondary to the corresponding non-ghost type. Normally during merge ghost types may be overwritten. However, local types cannot have comments at all. This is why ghost structure comments, if created, are valuable.
Similarly to comments attached to entire structures, each structure member may have a comment.
The same logic applies to ghost struct member comments:
Merging of selectors.
This merge phase uses the standard widget "Selectors".
The "Detail" pane is absent.
IDA Pro allocates so-called flags
for each program address. These flags describe how to display the corresponding bytes in the disassembly listing: as instruction or data.
There are two different storage methods for flags
: virtual array (VA) and sparse storage (MM). The virtual array method is the default one, it allocates 32 bits for each program address. However, for huge segments this method is not efficient and may lead to unnecessarily huge databases. Therefore for huge segments IDA Pro uses sparse storage.
This merge phase handles the defined program ranges and their storage types.
The "Detail" pane is absent.
This merge phase handles the program segmentation.
When merging segments, IDA combines them into non-overlapping groups. Each group will have its own diffpos
. For example, the following segmentations:
will result in a single diffpos
:
The "Detail" pane displays segments in the combined group with their attributes.
When merging segment, IDA tries to move the segment boundaries in a way that preserves the segment contents. If it fails to do so, the conflicting segments are deleted and new ones are created.
Merging of segment groups. Segment groups are used only in OMF files. They correspond to the group
keyword in assembler.
The "Detail" pane is absent.
Some processor have so-called segment registers. IDA Pro knows about them and can remember their value (one value per address range).
For example, the x86 processor has ds
, ss
, and many other registers. IDA Pro can remember that, say, ds
has the value of 1000 at the range 401000..402000.
This merge phase handles segment registers. For each register, a separate merge phase is created. It contains address ranges: inside each address range the value of the segment register stays the same.
To prepare diffpos
, IDA Teams combines segment register ranges into non-overlapping ranges. diffpos
is a range number.
The "Detail" pane displays segment register ranges in diffpos
with the value and the suffix that denotes the range type (u-user defined, a-automatically inherited from the previous range)
The database may have bytes that do not belong to any segment.
To prepare diffpos
, IDA Teams groups orphan bytes in the databases into nonintersecting ranges. diffpos
is a range number.
The "Detail" pane is absent.
Merging of the patched bytes.
The "Detail" pane is absent.
Byte values in segments may differ even for non-patched addresses, for example if a snapshot of the process memory was taken during a debugger session.
IDA Teams combines the sequential bytes in one diffpos
.
This merge phase uses the standard "IDA-View" widget.
The "Detail" pane displays the conflicting byte values.
Merging of fixup records.
The "Detail" pane is absent.
Merging of memory mappings.
The "Detail" pane is absent.
Merging of exported symbols.
Merge phase uses the standard "Exports" widget.
The "Detail" pane is absent.
Merging of imported symbols.
Merge phase uses the standard "Imports" widget.
The "Detail" pane is absent.
When merging, IDA Teams compares disassembly items (instructions and data). IDA Teams compares disassembly items by length, flags, opinfo, name, comment, and netnode information (NALT_* and NSUP_* flags).
This merge step uses the standard "IDA-View" widget so that items can be viewed in their context. For example:
Merging of extra comments.
This merge phase uses the standard "IDA-View" widget.
The "Detail" pane displays comment content.
Merging of additional flags aflags_t
.
Each disassembly item may have additional flags that further describe it.
This merge phase uses the standard "IDA-View" widget.
The "Detail" pane displays additional flags.
To prepare diffpos
, IDA Teams groups hidden ranges into nonintersecting ranges. diffpos
is a range number.
The "Detail" pane displays the hidden range description.
To prepare diffpos
, IDA Teams groups source file ranges into nonintersecting ranges. diffpos
is a range number.
The "Detail" pane displays source file definition.
Function definitions (func_t
) are merged using the standard "Functions" widget, while the "Detail" pane displays function attributes:
Merging of instruction kinds.
To simplify decompilation, IDA has the notion of the instruction kind:
PROLOG instruction
EPILOG instruction
SWITCH instruction
This merge phase uses the standard "IDA-View" widget.
The "Detail" pane displays instruction kind.
This merge phase deals with the entire function frames. Function frame may be added or deleted.
If members of the matched function frame differ, the conflict will be resolved later during the Functions/Frame/… merge phase. Each differing frame will be assigned its own merge step.
The "Detail" pane is absent.
Merging of function frame details.
A separate phase is created for each function. For example:
Functions/Frames/sub_401200 at 401200
Functions/Frames/_main at 4014E0
Every of these phases merges the conflicting members of the function frame.
The "Detail" pane displays the detailed information about the current function frame member.
Merging of function SP change points.
This merge phase uses the standard "IDA-View" widget.
The "Detail" pane displays the SP change point details.
Merging of regular execution flow from the previous instruction. IDA stores cross-references that correspond to regular execution flow in a special format, different from other cross-reference types.
This merge phase uses the standard "IDA-View" widget.
The "Detail" pane is absent.
Merging of code cross-references.
This merge phase uses the standard "IDA-View" widget.
The "Detail" pane displays code references to address (diffpos
).
Merging of data cross-references.
This merge phase uses the standard "IDA-View" widget.
The "Detail" pane displays data references to address (diffpos
).
The following merge phases exist:
Marked positions/structplace_t
Marked positions/enumplace_t
Marked position/idaplace_t
They deal with merging of bookmarks for:
structures
enums
addresses
The "Detail" pane is absent.
The following merge phases exist:
Breakpoints/Absolute bpts
Breakpoints/Relative bpts
Breakpoints/Symbolic bpts
Breakpoints/Source level bpts
They deal with merging of various debugger breakpoints.
The "Detail" pane is absent.
Merging of watch points.
The "Detail" pane is absent.
The following merge phases exist:
Dirtree/$ dirtree/tinfos
Dirtree/$ dirtree/structs
Dirtree/$ dirtree/enums
Dirtree/$ dirtree/funcs
Dirtree/$ dirtree/names
Dirtree/$ dirtree/imports
Dirtree/$ dirtree/bookmarks_idaplace_t
Dirtree/$ dirtree/bookmarks_structplace_t
Dirtree/$ dirtree/bookmarks_enumplace_t
Dirtree/$ dirtree/bpts
They deal with merging of the standard dirtrees.
The "Detail" pane is absent.
Merging of try and catch block info.
The "Detail" pane describes try block.
Merging of virtual function tables.
The "Detail" pane is absent.
Merging of database notepads. Each line of text is a diffpos
.
The "Detail" pane is absent.
Each processor plugin creates its own merge steps to handle the processor plugin’s specific data.
For example, the PC processor module adds the following merge steps:
Processor specific/Analyze ea for a possible offset
Processor specific/Frame pointer info
Processor specific/Pushinfo
Processor specific/VXD info 2
Processor specific/Callee EA|AH value
…
Merging of the decompiler data starts with the global configuration parameters from hexrays.cfg:
To handle decompilation of specific functions, IDA stores the decompilation data in a database netnode named Hexrays node.
The merge step Plugins/Decompiler/Hexrays nodes adds or deletes netnodes, indicating which functions have or haven’t been decompiled in each databases:
The decompilation data for matching functions is compared using the following attributes:
Plugins/Decompiler/…/Numforms
Plugins/Decompiler/…/mflags
Plugins/Decompiler/…/User-defined funcargs
Plugins/Decompiler/…/User-defined variable mapping
Plugins/Decompiler/…/User-defined lvar info
Plugins/Decompiler/…/lvar settings
Plugins/Decompiler/…/IFLAGS
Plugins/Decompiler/…/User labels
Plugins/Decompiler/…/User unions
Plugins/Decompiler/…/User comments
Plugins/Decompiler/…/User-defined call
If there is a difference, each comparison criteria will be assigned its own merge step. Each step will use the standard "Pseudocode" widget so that differences can be viewed in-context with the full pseudocode:
The file loader that was used to create the database may have stored some data in the database that is specific to the loader itself.
There are merge phases for each loader, for example:
Loader/PE file/…
Loader/NE file/…
Loader/ELF file/…
Loader/TLS/…
Loader/ARM segment flags/…
To handle the differences in debugger data the following merge steps may be created:
Debugger/pin
Debugger/gdb
Debugger/xnu
Debugger/ios
Debugger/bochs
Debugger/windbg
Debugger/rmac_arm
Debugger/lmac_arm
Debugger/rmac
Debugger/lmac
As can be deduced by their names, they handle debugger-specific data in the database.
There are a number of IDA plugins that need to merge their data.
For example:
Plugins/PDB
Plugins/golang
Plugins/EH_PARSE
Plugins/Callgraph
Plugins/swift
Any third party plugin may add merge phases using the IDA SDK. We provide sample plugins that illustrate how to add support for merging into third party plugins.
Any plugin that stores its data in the database must implement the logic for merging its data. For that, the plugin must provide the description of its data and ask the kernel to create merge handlers based on these descriptions.
The kernel will use the created handlers to perform merging and to display merged data to the users. The plugin can implement callback functions to modify some aspects of merging, if necessary.
The plugin may have two kinds of data with permanent storage:
Data that applies to entire database (e.g. the options). To describe this data, the idbattr_info_t
type is used.
Data that is tied to a particular address. To describe this data, the merge_node_info_t
type is used.
The kernel will notify the plugin using the processor_t::ev_create_merge_handlers
event. On receiving it, the plugin should create the merge handlers, usually by calling the create_merge_handlers()
function.
The IDA SDK provides several sample plugins to demonstrate how to add merge functionality to third party plugins:
mex1/
mex2/
mex3/
mex4/
The sample plugin without the merge functionality consists of two files:
mex.hpp
mex_impl.cpp
It is a regular implementation of a plugin that stores some data in the database. Please check the source files for more info.
We demonstrate several approaches to add the merge functionality. They are implemented in different directories mex1/, mex2/, and so on.
The MEX_N
macros that are defined in makefile are used to parameterize the plugin implementation, so that all plugin examples may be used simultaneously.
You may check the merge results for the plugins in one session of IDA Teams. Naturally, you should prepare databases by running plugins before launching of IDA Teams session.
The merge functionality is implemented in the merge.cpp file. It contains create_merge_handlers()
, which is responsible for the creation of the merge handlers.
Variants:
mex1/ Merge values are stored in netnodes. The kernel will read the values directly from netnodes, merge them, and write back. No further actions are required from the plugin. If the data is stored in a simple way using altvals or supvals, this simple approach is recommended.
mex2/ Merge values are stored in variables (in the memory). For more complex data that is not stored in a simple way in netnodes, (for example, data that uses database blobs), the previous approach cannot be used. This example shows how to merge the data that is stored in variables, like fields of the plugin context structure. The plugin provides the field descriptions to the kernel, which will use them to merge the data in the memory. After merging, the plugin must save the merged data to the database.
mex3/ Uses mex1 example and illustrates how to improve the UI look.
mex4/ Merge data that is stored in a netnode blob. Usually blob data is displayed as a sequence of hexadecimal digits in a merge chooser column. We show how to display blob contents in detail pane.
When a user needs to commit changes made to a file, but that same file has received other modifications (likely from other users) in the meantime, it is necessary to first "merge" the two sets of modifications together.
When the two sets of modifications do not overlap, merging is trivial - at least conceptually. But when they do overlap, they produce conflict(s).
Since IDA Teams focuses on collaboration over IDA database files, the rest of this section will focus on the different strategies that are available for resolving conflicts among those.
IDA Teams comes with multiple strategies to help in conflict resolution of IDA database files:
Launch IDA in a non-interactive batch mode, attempting to perform all merging automatically.
If any conflict is discovered, bail out of the merge process, and don’t modify the local database.
Launch IDA in a non-interactive batch mode, attempting to perform all merging automatically.
If a conflict is discovered, assume that the "local" change (i.e., the current user’s change) is the correct one, and apply that.
Once all merging is done and conflicts are resolved, write those to the local database and exit IDA
Launch IDA in a non-interactive batch mode, attempting to perform all merging automatically.
If a conflict is discovered, assume that the "remote" change (i.e., the change made by another user) is the correct one, and apply that.
Once all merging is done and conflicts are resolved, write those to the local database and exit IDA
Manual merge mode.
This will launch IDA in an interactive, 3-pane mode, allowing the user to decide how to resolve each conflict.
Once all merging is done and conflicts are resolved, exit IDA and write the changes to the local database.
Select the local database, ignoring all changes in the remote database.
No IDA process is run.
Select the remote database, ignoring all changes in the local database.
No IDA process is run.