Deconstructing Generated Assembly

Deconstructing the assembly resulting from example C++ code we compiled last time.

[ Check out all posts in “low-level” series here. ]

My previous post on low-level code was about this pointer. That included a small example program -compiled with optimizations disabled- and its disassembly.

This time, I elaborated on various code generation patterns that show up in that disassembly.¹

Reference

First of all, a reminder of the reference document.

As we mentioned before, System V ABI is used on Linux x86-64. The ABI consists of a “generic ABI” document, and various “processor supplements”.

The “System V ABI - AMD64 Architecture Processor Supplement” is the document I often refer to.

Calling conventions are documented in the processor supplement. I am not certain where the “official” up-to-date source is, but I get the latest psABI (processor supplement ABI) PDF from this gitlab repo.²

Disassembly

Here is what main looked like:

 .-----------------------------------------------.
 |  0x5563e81b9179                               |
 | int main(int argc, char **argv, char **envp); |
 | ; var int64_t var_11h @ stack - 0x11          |
 | ; var int64_t var_10h @ stack - 0x10          |
 | push  rbp                                     | 00
 | mov   rbp, rsp                                | 01
 | sub   rsp, 0x10                               | 02
 | mov   rax, qword fs:[0x28]                    | 03
 | mov   qword [var_10h], rax ; local.set 1      | 04
 | xor   eax, eax                                | 05
 | lea   rax, [var_11h] ; local.get 1            | 06
 | mov   esi, 0x40                               | 07
 | mov   rdi, rax                                | 08
 | call  sym.Foo::Foo_char                       | 09
 | lea   rax, [var_11h] ; local.get 1            | 10
 | mov   rdi, rax                                | 11
 | call  sym.Foo::RandChar                       | 12
 | movsx eax, al                                 | 13
 | mov   esi, eax                                | 14
 | lea   rax, str.Generated_char_is:__c          | 15
 | mov   rdi, rax                                | 16
 | mov   eax, 0                                  | 17
 | call  sym.imp.printf                          | 18
 | mov   eax, 0                                  | 19
 | mov   rdx, qword [var_10h] ; local.get 1      | 20
 | sub   rdx, qword fs:[0x28]                    | 21
 | je    0x5563e81b91df                          | 22
 '-----------------------------------------------'
       t f
       | |
    .--' |
    |    '------------------.
    |                       |
.------------------.    .--------------------------------.
|  0x5563e81b91df  |    |  0x5563e81b91da                |
| leave            | 23 | call  sym.imp.__stack_chk_fail | 25
| ret              | 24 '--------------------------------'
'------------------'

Before we proceed: rizin’s output replaces memory references with variable names.

In the actual disassembly, here are the important differences:

var_10h corresponds to rbp-0x8
var_11h corresponds to rbp-0x9
str.Generated_char_is:__c is [rip+0xe4b] (RIP-relative)

Above is easier to read, so I am keeping the variable names.

Let’s deconstruct this.

Prolog and Epilog

First section is what we call the “prolog”.

 .-----------------------------------------------.
 |  0x5563e81b9179                               |
 | int main(int argc, char **argv, char **envp); |
 | ; var int64_t var_11h @ stack - 0x11          |
 | ; var int64_t var_10h @ stack - 0x10          |
 ---                                    PROLOG ---
 | push  rbp                                     | 00
 | mov   rbp, rsp                                | 01
                       ...

This is a very generic pattern. The existing frame pointer value is pushed to stack, then the current stack pointer is copied to rbp. So then rbp points to the “base” of the “stack frame” of the function, within the context of that function.

There is an enter instruction that applies the same changes, and does a bit more, but apparently it is really slow, compared to the usual prolog. So gcc generates a push and mov.

And there is a corresponding “epilog” in the end (in the true branch target of the jump instruction).

                       ...
 '-----------------------------------------------'
       t f
       |
    .--'
    |
    |
.------------------.
|  0x5563e81b91df  |
---       EPILOG ---
| leave            | 23 
        ...

Epilog is normally the reverse of prolog:

mov rsp, rbp
pop rbp

But there is a corresponding leave instruction, and this one seems to be fast enough that gcc choose to generate a leave as epilog.

Frame Pointer Details

Check out “Function Calling Sequence” (section 3.2) in the processor supplement.

In “Parameter Passing” (section 3.2.3) describes rbp as:

callee-saved register; optionally used as frame pointer

Note that it states “optional”. If we were to enable optimization level one in gcc (-O1), it would enable -omit-frame-pointer optimization, which would allow the compiler to skip juggling the frame pointer when compiler decides it is not needed.

Note that in the case of main, the previous value of “frame pointer” is a bit of a special case:

%rbp : The content of this register is unspecified at process initialization time, but the user code should mark the deepest stack frame by setting the frame pointer to zero.

Reserving Space on Stack

Then we have this instruction in the line I marked as 02:

sub   rsp, 0x10

Stack grows down. So by subtracting from the stack pointer, we are reserving the space we need for the local variables of the function main.

Near the end of the function, when the epilogue copies the rbp to rsp, it will essentially mean discarding the space we reserved and used within the function.

A Visual Description

I am aware that it is not easy to follow what’s really happening if you are reading this stuff for the first time. So I looked for pages that give a more visual description of state changes during prolog and epilog. I found these two:

This page is part of a series about virtual memory. It does a good job of communicating the changes to the stack and relevant registers, and still keeping it simple.
On a quick look, this online book chapter seems to be a comprehensive tracing of the state changes during the execution of a function, that includes this prolog/epilog.

Stack Guard

Next, we have these sections surrounding rest of the code.

                       ...
 | mov   rax, qword fs:[0x28]                    | 03
 | mov   qword [var_10h], rax ; local.set 1      | 04
                       ...

                       ...
 | mov   rdx, qword [var_10h] ; local.get 1      | 20
 | sub   rdx, qword fs:[0x28]                    | 21
 | je    0x5563e81b91df                          | 22
 '-----------------------------------------------'
       t f
         |
         |
         '----------.
                    |
                .--------------------------------.
                |  0x5563e81b91da                |
                | call  sym.imp.__stack_chk_fail | 25
                '--------------------------------'

This logic is called stack protector, or stack guard.

You can read about this in this SO thread, and in the description of related GCC option.

Basically, a special value (“stack canary” value stored at fs:[0x28]) is copied to stack first. Near the end of the function, we confirm that it wasn’t overwritten. If it was overwritten, we call the “stack check failure” routine. Otherwise, we proceed to epilog and return normally.

Note that the SO discussion explicitly refers to XOR. In our example, we see a SUB instruction instead. But it should achieve the same result in this case.

Zeroing a Register

Here is a surprising instruction in the line I marked as 05:

xor   eax, eax

This is effectively going to set eax to 0. This interesting way of zeroing a register is chosen, because it is known to be the best way to do so.

Return and the Rest

Here is the parts we haven’t discussed yet.

                       ...
 | lea   rax, [var_11h] ; local.get 1            | 06
 | mov   esi, 0x40                               | 07
 | mov   rdi, rax                                | 08
 | call  sym.Foo::Foo_char                       | 09
 | lea   rax, [var_11h] ; local.get 1            | 10
 | mov   rdi, rax                                | 11
 | call  sym.Foo::RandChar                       | 12
 | movsx eax, al                                 | 13
 | mov   esi, eax                                | 14
 | lea   rax, str.Generated_char_is:__c          | 15
 | mov   rdi, rax                                | 16
 | mov   eax, 0                                  | 17
 | call  sym.imp.printf                          | 18
 | mov   eax, 0                                  | 19
                       ...

                       ...
 '-----------------------------------------------'
       t f
       |
    .--'
    |
    |
.------------------.
|  0x5563e81b91df  |
        ...
| ret              | 24
'------------------'

This involves:

Setting values on some registers to be used by calls to other functions.
Making calls and utilizing return values.
Setting the value to be returned from main.
Returning from procedure.

The rules of how these are achived is described in “Parameter Passing” section (3.2.3) of Processor Supplement, in detail. But “System V AMD64 ABI” section of “x86 calling conventions” Wikipedia page has a summary that includes the key aspects to remember.

We already detailed one example call in previous post. It is all the same rules repeated for calls to various functions. So I will skip those parts.

Update [ 2024-01-17 ]: I discussed the lea instruction that shows up above in my next post.

I will just mention that, in the line I marked 19, we do:

mov   eax, 0

This is because rax carries the return value in this case.

That’s all for today. The source is shared in blog’s repository. If you find technical errors, please report in the blog’s Issues page.

Thanks for reading!

To check out the C++ code that generated this, refer to the previous post. ↩
There is also an OSDev page that has links to related documents. This SO answer also has links to relevant docs. ↩