Functions

This submenu allows you to manipulate functions in the disassembly:

Create Function

Action    name: MakeFunction

This command defines a new function in the disassembly text.

You can specify function boundaries using the anchor. If you don't specify any, IDA will try to find the boundaries automatically:

   - function start point is equal to the current cursor position;
   - function end point is calculated by IDA.

A function cannot contain references to undefined instructions. If a function has already been defined at the specified addresses, IDA will jump to its start address, showing you a warning message.

A function must start with an instruction.

Edit Function

 Action    name: EditFunction

Here you can change function bounds, its name and flags. In order to change only the function end address, you can use FunctionEnd command.

If the current address does not belong to any function, IDA beeps.

This command allows you to change the function frame parameters too. You can change sizes of some parts of frame structure.

IDA considers the stack as the following structure:

      +------------------------------+
      | function arguments           |
      +------------------------------+
      | return address               |
      +------------------------------+
      | saved registers (SI,DI,etc)  |
      +------------------------------+  <- BP
      | local variables              |
      +------------------------------+  <- SP

For some processors or functions, BP may be equal to SP. In other words, it can point to the bottom of the stack frame.

You may specify the number of bytes in each part of the stack frame. The size of the return address is calculated by IDA (possibly depending on the far function flag).

"Purged bytes" specifies the number of bytes added to SP upon function return. This value will be used to calculate the SP changes at call sites (used in some calling conventions, such as __stdcall in Windows 32-bit programs.)

"BP based frame" allows IDA to automatically convert [BP+xxx] operands to stack variables.

"BP equal to SP" means that the frame pointer points to the bottom of the stack. It is usually used for the processors which set up the stack frame with EBP and ESP both pointing to the bottom of the frame (for example MC6816, M32R).

If you press <Enter> even without changing any parameter,IDA will reanalyze the function.

Sometimes, EBP points to the middle of the stack frame. FPD (frame pointer delta) is used to handle such situations. FPD is the value substracted from the EBP before accessing variables. An example:

           push    ebp
           lea     ebp, [esp-78h]
           sub     esp, 588h
           push    ebx
           push    esi
           lea     eax, [ebp+74h]

      +------------------------------+
      | function arguments           |
      +------------------------------+
      | return address               |
      +------------------------------+
      | saved registers (SI,DI,etc)  |
      +------------------------------+  <- typical BP
      |                              |
      |                              |
      |                              |  <- actual BP
      | local variables              |
      |                              |
      |                              |
      |                              |
      +------------------------------+  <- SP

In our example, the saved registers area is empty (since EBP has been initialized before saving EBX and ESI). The difference between the 'typical BP' and 'actual BP' is 0x78 and this is the value of FPD.

After specifying FPD=0x78 the last instruction of the example becomes

           lea     eax, [ebp+78h+var_4]

where var_4 = -4

Most of the time, IDA calculates the FPD value automatically. If it fails, the user can specify the value manually.

If the value of the stack pointer is modified in an unpredictable way, (e.g. "and esp, -16"), then IDA marks the function as "fuzzy-sp".

If this command is invoked for an imported function, then a simplified dialog box will appear on the screen.

Function flags

The following flags can be set in function properties:

Does not return

The function does not return to caller (for example, calls a process exit function or has an infinite loop). If no-return analysis is enabled in Kernel Options, IDA will not analyze bytes following the calls to this function.

Far function

On processors which distinguish near and far functions (e.g. PC x86), mark the function as 'far'. This may affect the size of the special stack frame field reserved for the return address, as well as analysis of calls to this function.

Library func

Mark the function as part of compiler runtime library code. This flag is usually set when applying FLIRT signatures

Static func

Mark the function as static. Currently this flag is not used by IDA and is simply informational.

BP based frame

Inform IDA that the function uses a frame pointer (BP/EBP/RBP on PC) to access local variables. The operands of the form [BP+xxx] will be automatically converted to stack variables.

BP equal to SP

Frame pointer points to the bottom of the stack instead of at the beginning of the local variables area as is typical.

Fuzzy SP

Function changes SP by an unknown value, for example: and esp, 0FFFFFFF0h

Outlined code

The function is not a real function but a fragment of multiple functions' common instruction sequence extracted by the compiler as a code size optimization (sometimes called "code factoring"). During decompilation, body of the function will be expanded at the call site.

Append Function Tail

 Action    name: AppendFunctionTail

This command appends an arbitrary range of the program to a function definition. A range must be selected before applying this command. This range must not intersect with other function chunks (however, an existing tail can be added to multiple functions).

IDA will ask to select the parent function for the selection and will append the range to the function definition.

Remove Function Tail

 Action    name: RemoveFunctionTail

This command removes the function tail at the cursor from a function definition.

If there are several parent functions for the current function tail range, IDA will ask to select the parent function(s) to remove the tail from.

After the confirmation, the current function tail range will be removed from the selected function definition.

If the parent was the only owner of the current tail, then the tail will be destroyed. Otherwise it will still be present in the database. If the removed parent was the owner of the tail, then another function will be selected as the owner.

Delete Function

 Action    name: DelFunction

Deleting a function deletes only information about a function, such as information about stack variables, comments, function type, etc.

The instructions composing the function will remain intact.

Set Function End

 Action    name: FunctionEnd

This command changes the current or previous function bounds so that its end will be set at the cursor. If it is not possible, IDA beeps.

Edit the argument location

Allow to edit argument or return value location.

Stack Variables Window

 Action    name: OpenStackVariables

This command opens the stack variables window for the current function.

The stack variables are internally represented as a structure. This structure consists of two parts: local variables and function arguments.

You can modify stack variable definitions here: add/delete/define stack variables, enter comments for them.

There may be two special fields in this window: " r" and " s". They represent the size of the function return address and of the saved registers in bytes. You cannot modify them directly. To change them, use edit function command.

Offsets at the line prefixes represent offsets from the frame pointer register (BP). The window indicator at the lower left corner of the window displays offsets from the stack pointer.

In order to create or delete a stack variable, use data definitions commands (data, strlit, array, undefine, Rename). Also you may define regular or repeatable comments.

The defined stack variables may be used in the program by converting operands to stack variables.

Esc closes this window.

Change Stack Pointer

Action    name: ChangeStackPointer

This command allows you to specify how the stack pointer (SP) is modified by the current instruction.

You cannot use this command if the current instruction does not belong to any function.

You will need to use this command only if IDA was not able to trace the value of the SP register. Usually IDA can handle it but in some special cases it fails. An example of such a situation is an indirect call of a function that purges its parameters from the stack. In this case, IDA has no information about the function and cannot properly trace the value of SP.

Please note that you need to specify the difference between the old and new values of SP.

The value of SP is used if the current function accesses local variables by [ESP+xxx] notation.

Rename register

 Action    name: RenameRegister

This command allows you to rename a processor general register to some meaningful name. While this is not used very often on IBM PCs, it is especially useful on RISC processors with lots of registers.

For example, a general register R9 is not very meaningful and a name like 'CurrentTime' is much better.

This command can be used to define a new register name as well as to remove it. Just move the cursor on the register name and press enter. If you enter the new register name as an empty string, then the definition will be deleted.

If you have selected a range before using this command, then the definition will be restricted to the selected range. But in any case, the definition cannot cross the function boundaries.

You cannot use this command if the current instruction does not belong to any function.

Set function/item type

 Action    name: SetType

This command allows you to specify the type of the current item.

If the cursor is located on a name, the type of the named item will be edited. Otherwise, the current function type (if there is a function) or the current item type (if it has a name) will be edited.

The function type must be entered as a C declaration. Hidden arguments (like 'this' pointer in C++) should be specified explicitly. IDA will use the type information to comment the disassembly with the information about function arguments. It can also be used by the Hex-Rays decompiler plugin for better decompilation.

Here is an example of a function declaration:

        int main(int argc, const char *argv[]);

To delete a type declaration, please enter an empty string.

IDA supports the user-defined calling convention. In this calling convention, the user can explicitly specify the locations of arguments and the return value. For example:

        int __usercall func@<ebx>(int x, int y@<esi>);

denotes a function with 2 arguments: the first argument is passed on the stack (IDA automatically calculates its offset) and the second argument is passed in the ESI register and the return value is stored in the EBX register. Stack locations can be specified explicitly:

        int __usercall runtime_memhash@<^12.4>(void *p@<^0.4>, int q@<^4.4>, int r@<^8.4>)

There is a restriction for a __usercall function type: all stack locations should be specified explicitly or all are automatically calculated by IDA. General rules for the user defined prototypes are:

  - the return value must be in a register.
    Exception: stack locations are accepted for the __golang and __usercall calling conventions.

  - if the return type is 'void', the return location must not be specified

  - if the argument location is not specified, it is assumed to be
    on the stack; consequent stack locations are allocated for such arguments

  - it is allowed to declare nested declarations, for example:
    int **__usercall func16@<eax>(int *(__usercall *x)@<ebx>
                                             (int, long@<ecx>, int)@<esi>);
    Here the pointer "x" is passed in the ESI register;
    The pointed function is a usercall function and expects its second
    argument in the ECX register, its return value is in the EBX register.
    The rule of thumb to apply in such complex cases is to specify the
    the registers just before the opening brace for the parameter list.

  - registers used for the location names must be valid for the current
    processor; some registers are unsupported (if the register name is
    generated on the fly, it is unsupported; inform us about such cases;
    we might improve the processor module if it is easy)

  - register pairs can be specified with a colon like <edx:eax>

- for really complicated cases this syntax can be used. IDA also understands the "__userpurge" calling convention. It is the same thing as __usercall, the only difference is that the callee cleans the stack.

The name used in the declaration is ignored by IDA.

If the default calling convention is __golang then explicit specification of stack offsets is permitted. For example:

  __attribute__((format(printf,2,3)))
  int myprnt(int id, const char *format, ...);

This declaration means that myprnt is a print-like function; the format string is the second argument and the variadic argument list starts at the third argument.

Below is the full list of attributes that can be handled by IDA. Please look up the details in the corresponding compiler help pages.

  packed        pack structure/union fields tightly, without gaps
  aligned       specify the alignment
  noreturn      declare as not returning function
  ms_struct     use microsoft layout for the structure/union
  format        possible formats: printf, scanf, strftime, strfmon

For data declarations, the following custom __attribute((annotate(X))) keywords have been added. The control the representation of numbers in the output:

__bin unsigned binary number

__oct unsigned octal number

__hex unsigned hexadecimal number

__dec signed decimal number

__sbin signed binary number

__soct signed octal number

__shex signed hexadecimal number

__udec unsigned decimal number

__float floating point

__char character

__segm segment name

__enum() enumeration member (symbolic constant)

__off offset expression (a simpler version of __offset)

__offset() offset expression

__strlit() string __stroff() structure offset

__custom() custom data type and format

__invsign inverted sign

__invbits inverted bitwise

__lzero add leading zeroes

__tabform() tabular form

The following additional keywords can be used in type declarations:

_BOOL1 a boolean type with explicit size specification (1 byte)

_BOOL2 a boolean type with explicit size specification (2 bytes)

_BOOL4 a boolean type with explicit size specification (4 bytes)

__int8 a integer with explicit size specification (1 byte)

__int16 a integer with explicit size specification (2 bytes)

__int32 a integer with explicit size specification (4 bytes)

__int64 a integer with explicit size specification (8 bytes)

__int128 a integer with explicit size specification (16 bytes)

_BYTE an unknown type; the only known info is its size: 1 byte

_WORD an unknown type; the only known info is its size: 2 bytes

_DWORD an unknown type; the only known info is its size: 4 bytes

_QWORD an unknown type; the only known info is its size: 8 bytes

_OWORD an unknown type; the only known info is its size: 16 bytes

_TBYTE 10-byte floating point value

_UNKNOWN no info is available

__pure pure function: always returns the same value and does not modify memory in a visible way

__noreturn function does not return

__usercall user-defined calling convention; see above

__userpurge user-defined calling convention; see above

__golang golang calling convention

__swiftcall swift calling convention

__spoils explicit spoiled-reg specification; see above

__hidden hidden function argument; this argument was hidden in the source code (e.g. 'this' argument in c++ methods is hidden)

__return_ptr pointer to return value; implies hidden

__struct_ptr was initially a structure value

__array_ptr was initially an array

__unused unused function argument __cppobj a c++ style struct; the struct layout depends on this keyword

__ptr32 explicit pointer size specification (32 bits)

__ptr64 explicit pointer size specification (64 bits)

__shifted shifted pointer declaration

__high high level prototype (does not explicitly specify hidden arguments like 'this', for example) this keyword may not be specified by the user but IDA may use it to describe high level prototypes

__bitmask a bitmask enum, a collection of bit groups

Shifted pointers

Sometimes in binary code we can encounter a pointer to the middle of a structure. Such pointers usually do not exist in the source code but an optimizing compiler may introduce them to make the code shorter or faster.

Such pointers can be described using shifted pointers. A shifted pointer is a regular pointer with additional information about the name of the parent structure and the offset from its beginning. For example:

        struct mystruct
        {
          char buf[16];
          int dummy;
          int value;            // <- myptr points here
          double fval;
        };
        int *__shifted(mystruct,20) myptr;

The above declaration means that myptr is a pointer to 'int' and if we decrement it by 20 bytes, we will end up at the beginning of 'mystruct'.

Please note that IDA does not limit parents of shifted pointers to structures. A shifted pointer after the adjustment may point to any type except 'void'.

Also, negative offsets are supported too. They mean that the pointer points to the memory before the structure.

When a shifted pointer is used with an adjustment, it will be displayed with the 'ADJ' helper function. For example, if we refer to the memory 4 bytes further, it can be represented like this:

        ADJ(myptr)->fval

Shifted pointers are an improvement compared to the CONTAINING_RECORD macro because expressions with them are shorter and easier to read.

Scattered argument locations

  00000000 struc_1         struc ; (sizeof=0xC)
  00000000 c1              db ?
  00000001                 db ? ; undefined
  00000002 s2              dw ?
  00000004 c3              db ?
  00000005                 db ? ; undefined
  00000006                 db ? ; undefined
  00000007                 db ? ; undefined
  00000008 i4              dd ?
  0000000C struc_1         ends

If we have this function prototype:

  void myfunc(struc_1 s);

the 64bit GNU compiler will pass the structure like this:

  RDI: c1, s2, and c3
  RSI: i4

Since compilers can use such complex calling conventions, IDA needs some mechanism to describe them. Scattered argument locations are used for that. The above calling convention can be described like this:

  void __usercall myfunc(struc_1 s@<0:rdi.1, 2:rdi^2.2, 4:rdi^4.1, 8:rsi.4>);

It reads:

  1 byte  at offset 0 of the argument is  passed in the byte 0 of RDI
  2 bytes at offset 2 of the argument are passed in the byte 1,2 of RDI
  1 byte  at offset 4 of the argument is  passed in the byte 3 of RDI
  4 bytes at offset 8 of the argument are passed starting from the byte 0 of RSI

In other words, the following syntax is used:

  argoff:register^regoff.size

where

  argoff - offset within the argument
  register - register name used to pass part of the argument
  regoff - offset within the register
  size - number of bytes

The regoff and size fields can be omitted if there is no ambiguity.

If the register is not specified, the expression describes a stack location:

  argoff:^stkoff.size

where

  argoff - offset within the argument
  stkoff - offset in the stack frame (the first stack argument is at offset 0)
  size - number of bytes

Please note that while IDA checks the argument location specifiers for soundness, it cannot perform all checks and some wrong locations may be accepted. In particular, IDA in general does not know the register sizes and accepts any offsets within them and any sizes.

Data representation: enum member

Syntax:

  __enum(enum_name)

Instead of a plain number, a symbolic constant from the specified enum will be used. The enum can be a regular enum or a bitmask enum. For bitmask enums, a bitwise combination of symbolic constants will be printed. If the value to print cannot be represented using the specified enum, it will be displayed in red.

Example:

   enum myenum { A=0, B=1, C=3 };
   short var __enum(myenum);

   If `var` is equal to 1, it will be represented as "B"

Another example:

   enum mybits __bitmask { INITED=1, STARTED=2, DONE=4 };
   short var __enum(mybits);

   If `var` is equal to 3, it will be represented as "INITED|STARTED"

This annotation is useful if the enum size is not equal to the variable size. Otherwise using the enum type for the declaration is better:

   myenum var;  // is 4 bytes, not 2 as above

Data representation: offset expression

Syntax:

  __offset(type, base, tdelta, target)
  __offset(type, base, tdelta)
  __offset(type, base)
  __offset(type|AUTO, tdelta)
  __offset(type)
  __off

where

type is one of:

  OFF8       8-bit full offset
  OFF16      16-bit full offset
  OFF32      32-bit full offset
  OFF64      64-bit full offset
  LOW8       low 8 bits of 16-bit offset
  LOW16      low 16 bits of 32-bit offset
  HIGH8      high 8 bits of 16-bit offset
  HIGH16     high 16 bits of 32-bit offset

The type can also be the name of a custom refinfo.

It can be combined with the following keywords:

  RVAOFF     based reference (rva)
  PASTEND    reference past an item
             it may point to an nonexistent address
  NOBASE     forbid the base xref creation
             implies that the base can be any value
             nb: base xrefs are created only if the offset base
             points to the middle of a segment
  SUBTRACT   the reference value is subtracted from the base value instead of
             (as usual) being added to it
  SIGNEDOP   the operand value is sign-extended (only supported for
             REF_OFF8/16/32/64)
  NO_ZEROS   an opval of 0 will be considered invalid
  NO_ONES    an opval of ~0 will be considered invalid
  SELFREF    the self-based reference

The base, target delta, and the target can be omitted. If the base is BADADDR, it can be omitted by combining the type with AUTO:

  __offset(type|AUTO, tdelta)

Zero based offsets without any additional attributes and having the size that corresponds the current application target (e.g. REF_OFF32 for a 32-bit bit application), the shoft __off form can be used.

Examples:

  A 64-bit offset based on the image base:

  int var __offset(OFF64|RVAOFF);

  A 32-bit offset based on 0 that may point to an non-existing address:

  int var __offset(OFF32|PASTEND|AUTO);

  A 32-bit offset based on 0x400000:

  int var __offset(OFF32, 0x400000);

  A simple zero based offset that matches the current application bitness:

  int var __off;

This annotation is useful the type of the pointed object is unknown or the variable size is different from the usual pointer size. Otherwise it is better to use a pointer:

  type *var;

Data representation: string

Syntax:

  __strlit(strtype, "encoding")
  __strlit(strtype, char1, char2, "encoding")
  __strlit(strtype)

where strtype is one of:

  C          Zero terminated string, 8 bits per symbol
  C_16       Zero terminated string, 16 bits per symbol
  C_32       Zero terminated string, 32 bits per symbol
  PASCAL     Pascal string: 1 byte length prefix, 8 bits per symbol
  PASCAL_16  Pascal string: 1 byte length prefix, 16 bits per symbol
  LEN2       Wide Pascal string: 2 byte length prefix, 8 bits per symbol
  LEN2_16    Wide Pascal string: 2 byte length prefix, 16 bits per symbol
  LEN4       Delphi string: 4 byte length prefix, 8 bits per symbol
  LEN4_16    Delphi string: 4 byte length prefix, 16 bits per symbol

It may be followed by two optional string termination characters (only for C). Finally, the string encoding may be specified, as the encoding name or "no_conversion" if the string encoding was not explicitly specified.

Example:

  A zero-terminated string in windows-1252 encoding:

  char array[10] __strlit(C,"windows-1252");

  A zero-terminated string in utf-8 encoding:

  char array[10] __strlit(C,"UTF-8");

Data representation: structure offset

Syntax:

  __stroff(structname)
  __stroff(structname, delta)

Instead of a plain number, the name of a struct or union member will be used. If delta is present, it will be subtracted from the value before converting it into a struct/union member name.

Example:

  An integer variable named `var` that hold an offset from the beginning of
  the `mystruct` structure:

  int var __stroff(mystruct);

  If mystruct is defined like this:

  struct mystruct
  {
    char a;
    char b;
    char c;
    char d;
  }

  The value 2 will be represented as `mystruct.c`

Another example:

  A structure offset with a delta:

  int var __stroff(mystruct, 1);

  The value 2 will be represented as `mystruct.d-1`

Data representation: custom data type and format

Syntax:

 __custom(dtid, fid)

where dtid is the name of a custom data type and fid is the name of a custom data format. The custom type and format must be registered by a plugin beforehand, at the database opening time. Otherwise, custom data type and format ids will be displayed instead of names.

Data representation: tabular form

Syntax:

  __tabform(flags)
  __tabform(flags,lineitems)
  __tabform(flags,lineitems,alignment)
  __tabform(,lineitems,alignment)
  __tabform(,,alignment)

This keyword is used to format arrays. The following flags are accepted:

  NODUPS do not use the `dup` keyword
  HEX    use hexadecimal numbers to show array indexes
  OCT    use octal numbers to show array indexes
  BIN    use binary numbers to show array indexes
  DEC    use decimal numbers to show array indexes

It is possible to combine NODUPS with the index radix: NODUPS|HEX

The `lineitems` and `alignment` attributes have the meaning described for the create array command.

Example:

  Display the array in tabular form, 4 decimal numbers on a line, each number
  taking 8 positions. Display indexes as comments in hexadecimal:

  char array[16] __tabform(HEX,4,8) __dec;

  A possible array may look like:

  dd   50462976, 117835012, 185207048, 252579084; 0
  dd  319951120, 387323156, 454695192, 522067228; 4
  dd  589439264, 656811300, 724183336, 791555372; 8
  dd  858927408, 926299444, 993671480,1061043516; 0Ch

Without this annotation, the `dup` keyword is permitted, number of items on a line and the alignment are not defined.