Merge Process

This section provides a detailed overview of the steps involved in the merge process. The list of predefined merge steps is defined in merge.hpp of the IDASDK:

enum merge_kind_t
{
  MERGE_KIND_NETNODE,           ///< netnode (no merging, to be used in idbunits)
  MERGE_KIND_AUTOQ,             ///< auto queues
  MERGE_KIND_INF,               ///< merge the inf variable (global settings)
  MERGE_KIND_ENCODINGS,         ///< merge encodings
  MERGE_KIND_ENCODINGS2,        ///< merge default encodings
  MERGE_KIND_SCRIPTS2,          ///< merge scripts common info
  MERGE_KIND_SCRIPTS,           ///< merge scripts
  MERGE_KIND_CUSTDATA,          ///< merge custom data type and formats
  MERGE_KIND_STRUCTS,           ///< merge structs (globally: add/delete structs entirely)
  MERGE_KIND_STRMEM,            ///< merge struct members
  MERGE_KIND_ENUMS,             ///< merge enums
  MERGE_KIND_TILS,              ///< merge type libraries
  MERGE_KIND_TINFO,             ///< merge tinfo
  MERGE_KIND_UDTMEM,            ///< merge UDT members (local types)
  MERGE_KIND_SELECTORS,         ///< merge selectors
  MERGE_KIND_STT,               ///< merge flag storage types
  MERGE_KIND_SEGMENTS,          ///< merge segments
  MERGE_KIND_SEGGRPS,           ///< merge segment groups
  MERGE_KIND_SEGREGS,           ///< merge segment registers
  MERGE_KIND_ORPHANS,           ///< merge orphan bytes
  MERGE_KIND_BYTEVAL,           ///< merge byte values
  MERGE_KIND_FIXUPS,            ///< merge fixups
  MERGE_KIND_MAPPING,           ///< merge manual memory mapping
  MERGE_KIND_EXPORTS,           ///< merge exports
  MERGE_KIND_IMPORTS,           ///< merge imports
  MERGE_KIND_PATCHES,           ///< merge patched bytes
  MERGE_KIND_FLAGS,             ///< merge flags_t
  MERGE_KIND_EXTRACMT,          ///< merge extra next or prev lines
  MERGE_KIND_AFLAGS_EA,         ///< merge aflags for mapped EA
  MERGE_KIND_IGNOREMICRO,       ///< IM ("$ ignore micro") flags
  MERGE_KIND_HIDDENRANGES,      ///< merge hidden ranges
  MERGE_KIND_SOURCEFILES,       ///< merge source files ranges
  MERGE_KIND_FUNC,              ///< merge func info
  MERGE_KIND_FRAMEMGR,          ///< merge frames (globally: add/delete frames entirely)
  MERGE_KIND_FRAME,             ///< merge function frame info (frame members)
  MERGE_KIND_STKPNTS,           ///< merge SP change points
  MERGE_KIND_FLOWS,             ///< merge flows
  MERGE_KIND_CREFS,             ///< merge crefs
  MERGE_KIND_DREFS,             ///< merge drefs
  MERGE_KIND_BPTS,              ///< merge breakpoints
  MERGE_KIND_WATCHPOINTS,       ///< merge watchpoints
  MERGE_KIND_BOOKMARKS,         ///< merge bookmarks
  MERGE_KIND_TRYBLKS,           ///< merge try blocks
  MERGE_KIND_DIRTREE,           ///< merge std dirtrees
  MERGE_KIND_VFTABLES,          ///< merge vftables
  MERGE_KIND_SIGNATURES,        ///< signatures
  MERGE_KIND_PROBLEMS,          ///< problems
  MERGE_KIND_UI,                ///< UI
  MERGE_KIND_NOTEPAD,           ///< notepad
  MERGE_KIND_LOADER,            ///< loader data
  MERGE_KIND_DEBUGGER,          ///< debugger data
  MERGE_KIND_LAST,              ///< last predefined merge handler type.
                                ///< please note that there can be more merge handler types,
                                ///< registered by plugins and processor modules.
};

The list of merge steps is not final. If for example there is a conflict in structure members then the new merge phase to resolve this conflict will be created. The same is hold for UDT, functions, frames and so on. In other words in general case the exact number of merge steps is undefined and depends on the databases.

Each item in a merge step is assigned to a difference position named diffpos. It may be an EA (effective address), enum id, structure member offset, artificial index and so on. In other words, a diffpos is a way of addressing something in the database.

Every merge step starts with the calculation of differences and conflicts between items at the corresponding difference positions. As the result there is a list of diffpos with differences or conflicts. The diffpos`s without differences are not included in the list. Adjacent `diffpos`s are combined into a difference range called `diffrange.

The merging process operates on a difference range diffrange. For one diffrange, a single merge policy can be selected.

Global settings/Database attributes

Merging of global database attributes. These attributes are mainly stored in the idainfo structure. This phase has two subphases:

  • Global settings/Database attributes/Graph mode

  • Global settings/Database attributes/Text mode

The "Detail" pane is absent.

merge mh inf

Global settings/Processor specific

Merging of global processor options. Usually these options are stored in the idpflags netnode.

The "Detail" pane is absent.

merge mh proc global

Encodings/Registry

Merging of registered string literal encodings. These encodings are used to properly display string literal in the disassembly listing.

The "Detail" pane is absent.

merge mh encodings

Encodings/Settings

Merging of default string encodings: what string encoding among the registered ones are considered as the default ones.

The "Detail" pane is absent.

merge mh encodings2

Scripts/Registry

Merging of embedded script snippets.

When merging of embedded script snippets, the script name/language is displayed, and the "Detail" pane contains the script source with the highlighted differences:

merge mh scripts

Scripts/Settings

Merging of the default snippet and tabulation size.

The "Detail" pane is absent.

merge mh scripts2

Custom data/Types and Custom data/Formats

Merging of the registered custom data types and formats.

The "Detail" pane is absent.

merge mh custdata

Types/Enums

Merging of assembler level enums (enum_t). Ghost enums are skipped in this phase, they will be merged when handling local types.

To calculate diffpos, IDA Teams matches enum members by name and maps all enums with common member names into one diffpos.

An example of enum merging:

In both idbs, enum constant "B" is present. However, in the remote idb "B" has a different parent enum, "enum_2". Therefore enum_1 in the local idb corresponds to enum_1 and enum_2 in the remote idb. The user can select either enum_1 from the local idb or enum_1 and enum_2 from the remote idb.

In other words, IDA will display both enum_1 and enum_2 in the Remote pane, indicating that the difference between the Local and Remote databases corresponds to two separate enums, but they are treated as a single difference location. The "Detail" pane will display the full enum definitions, with the differences highlighted:

merge mh enums

Types/Structs

Merging of assembler level structures (struc_t).

To calculate diffpos, IDA Teams matches structs by the following attributes, in this order:

  1. the structure name

  2. the structure tid and size

If we fail to match a structure, then it will stay unmatched. Such an unmatched structure will have it own diffpos, allowing the user to copy it to the other idb or to delete it altogether.

This merge phase deals with the entire structure types and their attributes. Entire structure types may be added or deleted, and/or conflicts in the structure attributes are resolved.

If members of matched structures (at the same diffpos) differ, the conflict will be resolved later, during the Types/Struct members/…​ merge phase.

In the UI, IDA will display the list of structure names, with the "Detail" pane showing the structure attributes:

merge mh structs

Types/Type libraries

Merging of the loaded type libraries.

This merge phase uses the standard "Type libraries" widget.

The "Detail" pane is absent.

merge mh tils

Types/Local types

Merging of local types.

To calculate diffpos, IDA Teams matches local types by the following attributes, in this order:

  1. the type name

  2. the ordinal number and base type

If we fail to match a type, then it will stay unmatched. Such an unmatched type will have it own diffpos, allowing the user to copy it to the other idb or to delete it altogether.

This merge phase deals with entire types and their attributes. Entire local types may be added or deleted, and/or conflicts in their attributes are resolved. Differences in type members (e.g., struct members) will be resolved in a separate phase: Types/Local type members

This merge phase uses the standard "Local types" widget. The "Detail" pane displays the type definition and its attributes.

merge mh tinfo

Types/Struct members/…​ and Types/Local type members/…​

For example:

  • Types/Struct members/struct_t

  • Types/Local type members/struct conflict_t

These merge phases merges the conflicting members of a structure or a local type.

The "Detail" pane displays full information about the current member along with its attributes.

merge mh udtmem

Types/Ghost struct comments

Ghost structs may have comments attached to them.

This merge phase handles these comments:

merge mh ghstrcmt

We need a separate phase for these comments in order not to lose them during merging because by default ghost types are considered secondary to the corresponding non-ghost type. Normally during merge ghost types may be overwritten. However, local types cannot have comments at all. This is why ghost structure comments, if created, are valuable.

Types/Struct members comments/…​

Similarly to comments attached to entire structures, each structure member may have a comment.

The same logic applies to ghost struct member comments:

merge mh strmemcmt

Addressing/Selectors

Merging of selectors.

This merge phase uses the standard widget "Selectors".

The "Detail" pane is absent.

merge mh selectors

Addressing/Storage types

IDA Pro allocates so-called flags for each program address. These flags describe how to display the corresponding bytes in the disassembly listing: as instruction or data.

There are two different storage methods for flags: virtual array (VA) and sparse storage (MM). The virtual array method is the default one, it allocates 32 bits for each program address. However, for huge segments this method is not efficient and may lead to unnecessarily huge databases. Therefore for huge segments IDA Pro uses sparse storage.

This merge phase handles the defined program ranges and their storage types.

The "Detail" pane is absent.

merge mh stt

Addressing/Segmentation

This merge phase handles the program segmentation.

When merging segments, IDA combines them into non-overlapping groups. Each group will have its own diffpos. For example, the following segmentations:

will result in a single diffpos:

merge mh segments

The "Detail" pane displays segments in the combined group with their attributes.

When merging segment, IDA tries to move the segment boundaries in a way that preserves the segment contents. If it fails to do so, the conflicting segments are deleted and new ones are created.

Addressing/Segment groups

Merging of segment groups. Segment groups are used only in OMF files. They correspond to the group keyword in assembler.

The "Detail" pane is absent.

merge mh seggrps

Addressing/Segment register/…​

Some processor have so-called segment registers. IDA Pro knows about them and can remember their value (one value per address range).

For example, the x86 processor has ds, ss, and many other registers. IDA Pro can remember that, say, ds has the value of 1000 at the range 401000..402000.

This merge phase handles segment registers. For each register, a separate merge phase is created. It contains address ranges: inside each address range the value of the segment register stays the same.

To prepare diffpos, IDA Teams combines segment register ranges into non-overlapping ranges. diffpos is a range number.

The "Detail" pane displays segment register ranges in diffpos with the value and the suffix that denotes the range type (u-user defined, a-automatically inherited from the previous range)

merge mh segregs

Addressing/Orphan bytes

The database may have bytes that do not belong to any segment.

To prepare diffpos, IDA Teams groups orphan bytes in the databases into nonintersecting ranges. diffpos is a range number.

The "Detail" pane is absent.

merge mh orphans

Addressing/Patched

Merging of the patched bytes.

The "Detail" pane is absent.

merge mh patches

Addressing/Byte values

Byte values in segments may differ even for non-patched addresses, for example if a snapshot of the process memory was taken during a debugger session.

IDA Teams combines the sequential bytes in one diffpos.

This merge phase uses the standard "IDA-View" widget.

The "Detail" pane displays the conflicting byte values.

merge mh byteval

Addressing/Fixups

Merging of fixup records.

The "Detail" pane is absent.

merge mh fixups

Addressing/Manual memory mapping

Merging of memory mappings.

The "Detail" pane is absent.

merge mh mapping

Symbols/Exports

Merging of exported symbols.

Merge phase uses the standard "Exports" widget.

The "Detail" pane is absent.

merge mh exports

Symbols/Imports

Merging of imported symbols.

Merge phase uses the standard "Imports" widget.

The "Detail" pane is absent.

merge mh imports

Disassembly/Items

When merging, IDA Teams compares disassembly items (instructions and data). IDA Teams compares disassembly items by length, flags, opinfo, name, comment, and netnode information (NALT_* and NSUP_* flags).

This merge step uses the standard "IDA-View" widget so that items can be viewed in their context. For example:

merge mh flags

Comments/Anterior lines and Comments/Posterior lines

Merging of extra comments.

This merge phase uses the standard "IDA-View" widget.

The "Detail" pane displays comment content.

merge mh extracmt

Disassembly/EA additional flags

Merging of additional flags aflags_t.

Each disassembly item may have additional flags that further describe it.

This merge phase uses the standard "IDA-View" widget.

The "Detail" pane displays additional flags.

merge mh aflags ea

Disasembly/Hidden ranges

To prepare diffpos, IDA Teams groups hidden ranges into nonintersecting ranges. diffpos is a range number.

The "Detail" pane displays the hidden range description.

merge mh hiddenranges

Disassembly/Source file ranges

To prepare diffpos, IDA Teams groups source file ranges into nonintersecting ranges. diffpos is a range number.

The "Detail" pane displays source file definition.

merge mh sourcefiles

Functions/Registry

Function definitions (func_t) are merged using the standard "Functions" widget, while the "Detail" pane displays function attributes:

merge mh func

Functions/IM flags

Merging of instruction kinds.

To simplify decompilation, IDA has the notion of the instruction kind:

  • PROLOG instruction

  • EPILOG instruction

  • SWITCH instruction

This merge phase uses the standard "IDA-View" widget.

The "Detail" pane displays instruction kind.

merge mh ignoremicro

Functions/Frames (global)

This merge phase deals with the entire function frames. Function frame may be added or deleted.

If members of the matched function frame differ, the conflict will be resolved later during the Functions/Frame/…​ merge phase. Each differing frame will be assigned its own merge step.

The "Detail" pane is absent.

merge mh framemgr

Functions/Frame

Merging of function frame details.

A separate phase is created for each function. For example:

  • Functions/Frames/sub_401200 at 401200

  • Functions/Frames/_main at 4014E0

Every of these phases merges the conflicting members of the function frame.

The "Detail" pane displays the detailed information about the current function frame member.

merge mh frame

Functions/SP change points

Merging of function SP change points.

This merge phase uses the standard "IDA-View" widget.

The "Detail" pane displays the SP change point details.

merge mh stkpnts

Cross-references/Flow

Merging of regular execution flow from the previous instruction. IDA stores cross-references that correspond to regular execution flow in a special format, different from other cross-reference types.

This merge phase uses the standard "IDA-View" widget.

The "Detail" pane is absent.

merge mh flows

Cross-references/Code

Merging of code cross-references.

This merge phase uses the standard "IDA-View" widget.

The "Detail" pane displays code references to address (diffpos).

merge mh crefs

Cross-references/Data

Merging of data cross-references.

This merge phase uses the standard "IDA-View" widget.

The "Detail" pane displays data references to address (diffpos).

merge mh drefs

Marked positions/…​

The following merge phases exist:

  • Marked positions/structplace_t

  • Marked positions/enumplace_t

  • Marked position/idaplace_t

They deal with merging of bookmarks for:

  • structures

  • enums

  • addresses

The "Detail" pane is absent.

merge mh bookmarks

Debug/Breakpoints/…​

The following merge phases exist:

  • Breakpoints/Absolute bpts

  • Breakpoints/Relative bpts

  • Breakpoints/Symbolic bpts

  • Breakpoints/Source level bpts

They deal with merging of various debugger breakpoints.

The "Detail" pane is absent.

merge mh bpts

Debug/Watchpoints

Merging of watch points.

The "Detail" pane is absent.

merge mh watchpoints

Dirtree/$ dirtree/…​

The following merge phases exist:

  • Dirtree/$ dirtree/tinfos

  • Dirtree/$ dirtree/structs

  • Dirtree/$ dirtree/enums

  • Dirtree/$ dirtree/funcs

  • Dirtree/$ dirtree/names

  • Dirtree/$ dirtree/imports

  • Dirtree/$ dirtree/bookmarks_idaplace_t

  • Dirtree/$ dirtree/bookmarks_structplace_t

  • Dirtree/$ dirtree/bookmarks_enumplace_t

  • Dirtree/$ dirtree/bpts

They deal with merging of the standard dirtrees.

The "Detail" pane is absent.

merge mh dirtree tinfos
merge mh dirtree structs

Misc/Try blocks

Merging of try and catch block info.

The "Detail" pane describes try block.

merge mh tryblks

Misc/Virtual function tables

Merging of virtual function tables.

The "Detail" pane is absent.

merge mh vftables

Misc/Notepad

Merging of database notepads. Each line of text is a diffpos.

The "Detail" pane is absent.

merge mh notepad

Processor specific/…​

Each processor plugin creates its own merge steps to handle the processor plugin’s specific data.

For example, the PC processor module adds the following merge steps:

  • Processor specific/Analyze ea for a possible offset

  • Processor specific/Frame pointer info

  • Processor specific/Pushinfo

  • Processor specific/VXD info 2

  • Processor specific/Callee EA|AH value

  • …​

merge mh proc pushinfo
merge mh proc vxd2

Plugins/Decompiler/…​

Merging of the decompiler data starts with the global configuration parameters from hexrays.cfg:

merge mh vd cfg

To handle decompilation of specific functions, IDA stores the decompilation data in a database netnode named Hexrays node.

The merge step Plugins/Decompiler/Hexrays nodes adds or deletes netnodes, indicating which functions have or haven’t been decompiled in each databases:

merge mh vd nodes

The decompilation data for matching functions is compared using the following attributes:

  • Plugins/Decompiler/…​/Numforms

  • Plugins/Decompiler/…​/mflags

  • Plugins/Decompiler/…​/User-defined funcargs

  • Plugins/Decompiler/…​/User-defined variable mapping

  • Plugins/Decompiler/…​/User-defined lvar info

  • Plugins/Decompiler/…​/lvar settings

  • Plugins/Decompiler/…​/IFLAGS

  • Plugins/Decompiler/…​/User labels

  • Plugins/Decompiler/…​/User unions

  • Plugins/Decompiler/…​/User comments

  • Plugins/Decompiler/…​/User-defined call

If there is a difference, each comparison criteria will be assigned its own merge step. Each step will use the standard "Pseudocode" widget so that differences can be viewed in-context with the full pseudocode:

merge mh numforms
merge mh mflags
merge mh funcargs
merge mh vmap
merge mh lvinfo
merge mh lvar settings
merge mh iflags
merge mh labels
merge mh unions
merge mh cmts
merge mh ucall

Loader data merge phases

The file loader that was used to create the database may have stored some data in the database that is specific to the loader itself.

There are merge phases for each loader, for example:

  • Loader/PE file/…​

  • Loader/NE file/…​

  • Loader/ELF file/…​

  • Loader/TLS/…​

  • Loader/ARM segment flags/…​

merge mh ldr pe
merge mh ldr arm

Debugger data merge phases

To handle the differences in debugger data the following merge steps may be created:

  • Debugger/pin

  • Debugger/gdb

  • Debugger/xnu

  • Debugger/ios

  • Debugger/bochs

  • Debugger/windbg

  • Debugger/rmac_arm

  • Debugger/lmac_arm

  • Debugger/rmac

  • Debugger/lmac

As can be deduced by their names, they handle debugger-specific data in the database.

merge mh dbg pin
merge mh dbg windbg

Other plugins merge phases

There are a number of IDA plugins that need to merge their data.

For example:

  • Plugins/PDB

  • Plugins/golang

  • Plugins/EH_PARSE

  • Plugins/Callgraph

  • Plugins/swift

Any third party plugin may add merge phases using the IDA SDK. We provide sample plugins that illustrate how to add support for merging into third party plugins.

merge mh pdb
merge mh swift

Using IDASDK to add merge functionality to plugin

Overview

Any plugin that stores its data in the database must implement the logic for merging its data. For that, the plugin must provide the description of its data and ask the kernel to create merge handlers based on these descriptions.

The kernel will use the created handlers to perform merging and to display merged data to the users. The plugin can implement callback functions to modify some aspects of merging, if necessary.

The plugin may have two kinds of data with permanent storage:

  1. Data that applies to entire database (e.g. the options). To describe this data, the idbattr_info_t type is used.

  2. Data that is tied to a particular address. To describe this data, the merge_node_info_t type is used.

The kernel will notify the plugin using the processor_t::ev_create_merge_handlers event. On receiving it, the plugin should create the merge handlers, usually by calling the create_merge_handlers() function.

Plugin

The IDA SDK provides several sample plugins to demonstrate how to add merge functionality to third party plugins:

  • mex1/

  • mex2/

  • mex3/

  • mex4/

The sample plugin without the merge functionality consists of two files:

  • mex.hpp

  • mex_impl.cpp

It is a regular implementation of a plugin that stores some data in the database. Please check the source files for more info.

We demonstrate several approaches to add the merge functionality. They are implemented in different directories mex1/, mex2/, and so on.

The MEX_N macros that are defined in makefile are used to parameterize the plugin implementation, so that all plugin examples may be used simultaneously.

You may check the merge results for the plugins in one session of IDA Teams. Naturally, you should prepare databases by running plugins before launching of IDA Teams session.

Merge functionality

The merge functionality is implemented in the merge.cpp file. It contains create_merge_handlers(), which is responsible for the creation of the merge handlers.

Variants:

mex1/ Merge values are stored in netnodes. The kernel will read the values directly from netnodes, merge them, and write back. No further actions are required from the plugin. If the data is stored in a simple way using altvals or supvals, this simple approach is recommended.

mex2/ Merge values are stored in variables (in the memory). For more complex data that is not stored in a simple way in netnodes, (for example, data that uses database blobs), the previous approach cannot be used. This example shows how to merge the data that is stored in variables, like fields of the plugin context structure. The plugin provides the field descriptions to the kernel, which will use them to merge the data in the memory. After merging, the plugin must save the merged data to the database.

mex3/ Uses mex1 example and illustrates how to improve the UI look.

mex4/ Merge data that is stored in a netnode blob. Usually blob data is displayed as a sequence of hexadecimal digits in a merge chooser column. We show how to display blob contents in detail pane.

Last updated

Was this helpful?