Type Annotations

IDA extends standard C/C++ type declarations with specialized annotations that provide control over data interpretation and display in disassembly and decompiled code.

For a complete list of all type system keywords, see the Type System Keywords

Shifted Pointers

Sometimes in binary code we can encounter a pointer to the middle of a structure. Such pointers usually do not exist in the source code but an optimizing compiler may introduce them to make the code shorter or faster.

Such pointers can be described using shifted pointers. A shifted pointer is a regular pointer with additional information about the name of the parent structure and the offset from its beginning. For example:

        struct mystruct
        {
          char buf[16];
          int dummy;
          int value;            // <- myptr points here
          double fval;
        };
        int *__shifted(mystruct,20) myptr;

The above declaration means that myptr is a pointer to 'int' and if we decrement it by 20 bytes, we will end up at the beginning of 'mystruct'.

Please note that IDA does not limit parents of shifted pointers to structures. A shifted pointer after the adjustment may point to any type except 'void'.

Also, negative offsets are supported too. They mean that the pointer points to the memory before the structure.

When a shifted pointer is used with an adjustment, it will be displayed with the 'ADJ' helper function. For example, if we refer to the memory 4 bytes further, it can be represented like this:

        ADJ(myptr)->fval

Shifted pointers are an improvement compared to the CONTAINING_RECORD macro because expressions with them are shorter and easier to read.

Scattered Argument Locations

Modern compilers may pass structure arguments across multiple registers or mixed register/stack locations. Scattered argument locations describe these complex calling conventions.

If we have this function prototype:

the 64bit GNU compiler will pass the structure like this:

Since compilers can use such complex calling conventions, IDA needs some mechanism to describe them. Scattered argument locations are used for that. The above calling convention can be described like this:

It reads:

  • 1 byte at offset 0 of the argument is passed in byte 0 of RDI

  • 2 bytes at offset 2 of the argument are passed in bytes 1–2 of RDI

  • 1 byte at offset 4 of the argument is passed in byte 3 of RDI

  • 4 bytes at offset 8 of the argument are passed starting from byte 0 of RSI

In other words, the following syntax is used:

where:

  • argoff — offset within the argument

  • register — register name used to pass part of the argument

  • regoff — offset within the register

  • size — number of bytes

The regoff and size fields can be omitted if there is no ambiguity.

If the register is not specified, the expression describes a stack location:

where:

  • argoff - offset within the argument

  • stkoff - offset in the stack frame (the first stack argument is at offset 0)

  • size - number of bytes

Please note that while IDA checks the argument location specifiers for soundness, it cannot perform all checks and some wrong locations may be accepted. In particular, IDA in general does not know the register sizes and accepts any offsets within them and any sizes.

See also the Set type... (action SetType) command.

Data Representation Annotations

Data representation: enum member

Syntax:

Instead of a plain number, a symbolic constant from the specified enum will be used. The enum can be a regular enum or a bitmask enum. For bitmask enums, a bitwise combination of symbolic constants will be printed. If the value to print cannot be represented using the specified enum, it will be displayed in red.

Example:

If var is equal to 1, it will be represented as "B"

Another example:

If var is equal to 3, it will be represented as "INITED|STARTED"

This annotation is useful if the enum size is not equal to the variable size. Otherwise using the enum type for the declaration is better:

Data representation: offset expression

Syntax:

where type is one of:

Type
Description

OFF8

8-bit full offset

OFF16

16-bit full offset

OFF32

32-bit full offset

OFF64

64-bit full offset

LOW8

low 8 bits of 16-bit offset

LOW16

low 16 bits of 32-bit offset

HIGH8

high 8 bits of 16-bit offset

HIGH16

high 16 bits of 32-bit offset

The type can also be the name of a custom refinfo.

It can be combined with the following keywords:

Keyword
Description

RVAOFF

based reference (rva)

PASTEND

reference past an item; it may point to an nonexistent address

NOBASE

forbid the base xref creation; implies that the base can be any value Note: base xrefs are created only if the offset base points to the middle of a segment

SUBTRACT

the reference value is subtracted from the base value instead of (as usual) being added to it

SIGNEDOP

the operand value is sign-extended (only supported for REF_OFF8/16/32/64)

NO_ZEROS

an opval of 0 will be considered invalid

NO_ONES

an opval of ~0 will be considered invalid

SELFREF

the self-based reference

The base, target delta, and the target can be omitted. If the base is BADADDR, it can be omitted by combining the type with AUTO:

Zero based offsets without any additional attributes and having the size that corresponds the current application target (e.g. REF_OFF32 for a 32-bit bit application), the shoft __off form can be used.

Examples:

  • A 64-bit offset based on the image base:

  • A 32-bit offset based on 0 that may point to an non-existing address:

  • A 32-bit offset based on 0x400000:

  • A simple zero based offset that matches the current application bitness:

This annotation is useful when the type of the pointed object is unknown, or the variable size is different from the usual pointer size. Otherwise, it is better to use a pointer:

Data representation: string

Syntax:

where strtype is one of:

Type
Description

C

Zero-terminated string, 8 bits per symbol

C_16

Zero-terminated string, 16 bits per symbol

C_32

Zero-terminated string, 32 bits per symbol

PASCAL

Pascal string: 1-byte length prefix, 8 bits per symbol

PASCAL_16

Pascal string: 1-byte length prefix, 16 bits per symbol

LEN2

Wide Pascal string: 2-byte length prefix, 8 bits per symbol

LEN2_16

Wide Pascal string: 2-byte length prefix, 16 bits per symbol

LEN4

Delphi string: 4-byte length prefix, 8 bits per symbol

LEN4_16

Delphi string: 4-byte length prefix, 16 bits per symbol

It may be followed by two optional string termination characters (only for C). Finally, the string encoding may be specified, as the encoding name or "no_conversion" if the string encoding was not explicitly specified.

Example:

  • A zero-terminated string in windows-1252 encoding:

  • A zero-terminated string in utf-8 encoding:

Data representation: structure offset

Syntax:

Instead of a plain number, the name of a struct or union member will be used. If delta is present, it will be subtracted from the value before converting it into a struct/union member name.

Example: An integer variable named var that hold an offset from the beginning of the mystruct structure:

If mystruct is defined like this:

The value 2 will be represented as mystruct.c

Another example: A structure offset with a delta:

The value 2 will be represented as mystruct.d-1

Data representation: custom data type and format

Syntax:

where dtid is the name of a custom data type and fid is the name of a custom data format. The custom type and format must be registered by a plugin beforehand, at the database opening time. Otherwise, custom data type and format ids will be displayed instead of names.

Data representation: tabular form

Syntax:

This keyword is used to format arrays. The following flags are accepted:

Flag
Description

NODUPS

do not use the dup keyword

HEX

use hexadecimal numbers to show array indexes

OCT

use octal numbers to show array indexes

BIN

use binary numbers to show array indexes

DEC

use decimal numbers to show array indexes

It is possible to combine NODUPS with the index radix: NODUPS|HEX

The `lineitems` and `alignment` attributes have the meaning described for the Array... (action MakeArray) command.

Example:

Display the array in tabular form, 4 decimal numbers on a line, each number taking 8 positions. Display indexes as comments in hexadecimal:

A possible array may look like:

Without this annotation, the `dup` keyword is permitted, number of items on a line and the alignment are not defined.

Last updated

Was this helpful?