Assembly-level debugging tip for a C++-heavy project.

[ Check out all posts in “low-level” series here. ]

Here is the key takeaway of this post:

In x86-64 Linux distributions, the this pointer will usually be passed via rdi register when calling a non-static member function.

Nowadays, most debugging is thankfully done at source level. However, I think some familiarity of lower-level patterns is good. The role of rdi register is a good start, because it is easy to remember, and it shows up a lot in generated instructions.

I will start with this occasionally-useful bit of information, and use it as an excuse to discuss some key concepts in future posts. Some of these concepts will be needed to understand interop as well.

Let’s confirm the statement using the silly example program below:

#include <cstdio>
#include <ctime>
#include <cmath>

#define RAND_MOD 32

struct Foo
{
    Foo( const char offset ) : m_charOffset( offset )
    {
    }

    /* Generate a random char, offsetting
       the value to get a printable character. */
    char
     RandChar( )
    {
        srand( clock( ) );
        char randChar = rand( ) % RAND_MOD;
        return randChar + this->m_charOffset;
    }

    char m_charOffset;
};

int
 main( )
{
    Foo foo{ 0x40 };

    printf( "Generated char is: %c\n", foo.RandChar( ) );

    return 0;
}

File is called this_is_rdi.cc. It satisfies a few requirements for the topic: a class, a member function, data member access, etc.

We can compile with the usual:

g++ this_is_rdi.cc -o this_is_rdi -O0

We don’t optimize at all so that the assembly maps nicely to source.

When executed, the program, this_is_rdi, will print out a random character:

Generated char is: S

Here is the function block graph rizin generated for main:

 .-----------------------------------------------.
 |  0x5563e81b9179                               |
 | int main(int argc, char **argv, char **envp); |
 | ; var int64_t var_11h @ stack - 0x11          |
 | ; var int64_t var_10h @ stack - 0x10          |
 | push  rbp                                     |
 | mov   rbp, rsp                                |
 | sub   rsp, 0x10                               |
 | mov   rax, qword fs:[0x28]                    |
 | mov   qword [var_10h], rax ; local.set 1      |
 | xor   eax, eax                                |
 | lea   rax, [var_11h] ; local.get 1            |
 | mov   esi, 0x40                               |
 | mov   rdi, rax                                |
 | call  sym.Foo::Foo_char                       |
 | lea   rax, [var_11h] ; local.get 1  <- address of foo
 | mov   rdi, rax                      <- copy addr to RDI
 | call  sym.Foo::RandChar             <- call Foo::RandChar
 | movsx eax, al                                 |
 | mov   esi, eax                                |
 | lea   rax, str.Generated_char_is:__c          |
 | mov   rdi, rax                                |
 | mov   eax, 0                                  |
 | call  sym.imp.printf                          |
 | mov   eax, 0                                  |
 | mov   rdx, qword [var_10h] ; local.get 1      |
 | sub   rdx, qword fs:[0x28]                    |
 | je    0x5563e81b91df                          |
 '-----------------------------------------------'
       t f
       | |
    .--' |
    |    '------------------.
    |                       |
.------------------.    .--------------------------------.
|  0x5563e81b91df  |    |  0x5563e81b91da                |
| leave            |    | call  sym.imp.__stack_chk_fail |
| ret              |    '--------------------------------'
'------------------'

While disassembling, rizin generates convenient symbol names for data addresses, which is easier to read.

I annotated the important bits.

Write the address of stack location reserved for foo, to rax:

lea   rax, [var_11h] ; local.get 1

Copy foo address stored in rax to rdi:

mov   rdi, rax

Call Foo::RandChar member function for foo:

call  sym.Foo::RandChar

Note that there is some obvious redundancy here due to optimization being disabled.

And here is the beginning of Foo::RandChar:

.------------------------------------------.
|  0x5563e81b91fc                          |
| sym.Foo::RandChar(int64_t arg1);         |
| ; arg int64_t arg1 @ rdi                 |
| ; var int64_t var_20h @ stack - 0x20     |
| ; var int64_t var_9h @ stack - 0x9       |
| push  rbp                                |
| mov   rbp, rsp                           |
| sub   rsp, 0x20                          |
| mov   qword [var_20h], rdi ; local.set 1 |
| call  sym.imp.clock                      |
| mov   edi, eax                           |
|                 ...                      |

Thanks to rizin’s annotation of the function, we can clearly see that the function got one implicit argument, and that was passed via rdi.

This behaviour is defined as a combination of ABI standards:

Itanium C++ ABI

Itanium C++ ABI is followed by most C++ compilers (except MSVC). About this pointer, it says below:

Non-static member functions, including constructors and destructors, take an implicit this parameter of pointer type. It is passed as if it were the first parameter in the function prototype, except as modified for non-trivial return values.

System V AMD64 ABI

System V AMD64 ABI defines the calling conventions followed by Linux x86-64.

The passing of parameters for function calls is part of the calling convention.

System V AMD64 ABI says first 6 integer or pointer arguments are carried in: RDI, RSI, RDX, RCX, R8, R9, in that order.

So:

  • the first integer or pointer parameter is carried in rdi.
  • this pointer is usually the implicit first parameter of a non-static member function.

Therefore, we observe this behaviour.

In a later post, we will have a closer look at concepts like calling conventions.

We will also step through the generated assembly for this example code, and point out a few other common patterns.

That’s it for today. Thanks for reading. If you find technical errors, please report in the blog’s Issues page.