Tail vs. Body Recursion in Erlang Part 2

Recursion in Erlang Bytecode Link to heading

In part 1, I talked a little about how Erlang optimizes tail recursive functions, a process generally known as Tail Call Optimization (TCO). To verify this, we can compile the functions into Erlang assembler source code and take a look. Erlang assembler source is the disassembled bytecode which gets converted to a BEAM file.

The erlc compiler command has a flag (-S) to compile into Erlang assembly.

$ erlc -S map.erl

I compiled my map.erl module into an Erlang assembler file map.S

A few notes about Erlang assembly:

The full list of opcodes can be found here: genop.tab
The CP register stands for Continuation Pointer
There are two sets of registers: ‘x’ and ‘y’.
The ‘x’ registers are used for passing function parameters
The ‘y’ registers are used for local variables

Assembly code for map_tail/3

19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42


 {function, map_tail, 3, 4}.
   {label,3}.
     {line,[{location,"map.erl",11}]}.
     {func_info,{atom,map},{atom,map_tail},3}.
   {label,4}.
     {test,is_nonempty_list,{f,5},[{x,1}]}.
     {allocate,3,3}.
     {get_list,{x,1},{x,3},{y,2}}.
     {move,{x,0},{x,1}}.
     {move,{x,3},{x,0}}.
     {move,{x,2},{y,0}}.
     {move,{x,1},{y,1}}.
     {line,[{location,"map.erl",14}]}.
     {call_fun,1}.
     {test_heap,2,1}.
     {put_list,{x,0},{y,0},{x,2}}.
     {move,{y,2},{x,1}}.
     {move,{y,1},{x,0}}.
     {call_last,3,{f,4},3}.
   {label,5}.
     {test,is_nil,{f,3},[{x,1}]}.
     {move,{x,2},{x,0}}.
     {line,[{location,"map.erl",12}]}.
     {call_ext_only,1,{extfunc,lists,reverse,1}}.

Lines (24-27) implements the main function clause which executes the mapping. It invokes a recursive call with the call_last/3 opcode. I pulled the description from the genop.tab file.

Opcode call_last/3 comments

## @spec call_last Arity Label Deallocate
## @doc Deallocate and do a tail recursive call to the function at Label.
##      Do not update the CP register.
##      Before the call deallocate Deallocate words of stack.
5: call_last/3

Assembly code for map_body/2

45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72


 {function, map_body, 2, 7}.
   {label,6}.
     {line,[{location,"map.erl",17}]}.
     {func_info,{atom,map},{atom,map_body},2}.
   {label,7}.
     {test,is_nonempty_list,{f,8},[{x,1}]}.
     {allocate,2,2}.
     {get_list,{x,1},{x,2},{y,1}}.
     {move,{x,0},{x,1}}.
     {move,{x,2},{x,0}}.
     {move,{x,1},{y,0}}.
     {line,[{location,"map.erl",18}]}.
     {call_fun,1}.
     {move,{x,0},{x,2}}.
     {move,{y,1},{x,1}}.
     {move,{y,0},{x,0}}.
     {move,{x,2},{y,1}}.
     {trim,1,1}.
     {line,[{location,"map.erl",18}]}.
     {call,2,{f,7}}.
     {test_heap,2,1}.
     {put_list,{y,0},{x,0},{x,0}}.
     {deallocate,1}.
     return.
   {label,8}.
     {test,is_nil,{f,6},[{x,1}]}.
     {move,nil,{x,0}}.
     return.

Lines (50-68) implement the main function clause which executes the mapping. It invokes a recursive call with the call/2 opcode. I pulled the description from the genop.tab file.

Opcode call/2 comments

## @spec call Arity Label
## @doc Call the function at Label.
##      Save the next instruction as the return address in the CP register.
4: call/2

The tail recursive implementation replaces call/2 with call_last/3. From the description, call_last/3 will deallocate the stack frame before making the function call and not update the CP register. Therefore, the tail recursive implementation with be optimized to reuse the same stack frame for each recursive call.