Table of contents

  1. Introduction
    1. Recap: Functions in C
  2. First steps towards common calling conventions
  3. Memory: the stack
    1. Understanding the stack
    2. Manipulating the stack in RISC-V
    3. Excursion: The stack grows downwards!?
  4. Using the stack for function calls
  5. Summary: Complete calling conventions
    1. Exercise 1
  6. Additional exercises
    1. Exercise 2
    2. Exercise 3
    3. Exercise 4
  7. Excursion: Tail recursion
    1. Excursion exercise

Introduction

In the last sessions, you have already gained some experience with assembly code in RISC-V and, by now, also have a basic understanding of C code. In this session, we will dive deeper into what difficulties arise when writing larger programs, how we can extend our registers by using memory, and how this helps us to write code elements that resemble functions as you know them from higher-level languages.

Recap: Functions in C

Let’s first take a look at functions in C. Below you see an example program with a helper function that calculates a simple sum.

sum.c Console
// Include I/O library functions (printf, scanf)
# include <stdio.h>

// This function allows us to calculate the sum of two numbers.
// With this, we can reuse the sum calculation at different
// locations of our code as an abstract operation.
int sum(int a, int b) {
    return a + b;
}

// Main function where the program starts
int main(void) {
    int n;
    printf("Enter a number:\n");
    scanf("%d", &n);
    printf("Result: n+2=%d\n", sum(n, 2));
    return 0;
}
$ gcc sum.c -o sum
$ ./sum
Enter a number:
5
Result: n+2=7

You already know functions from other languages than C, they are very useful to bundle common tasks or make the program simpler to think about. In essence, functions allow for abstraction via parameters and return values.

But how can we do this in assembly? There is almost no concept of a function in assembly. If languages like C are compiled to assembly code, how do they transform their functions to assembly code then? The answer is simple and slightly difficult at the same time:

  1. Functions can be implemented using common low-level primitives such as labels, jumps, and registers. Doing this is simple, and you may already have done so intuitively during the last sessions.
  2. However, making sure that you can always and deterministically reuse or call a function requires some effort. You may already have experienced that different developers use different registers to do the same thing. What if you use register t0 to pass a parameter to a function, but the other developer expects the parameter in t1? What happens if we run out of registers? These and other situations require for a clear set of (calling) conventions that we expect everyone to adhere to - even (or especially) compilers! This is the slightly difficult part of functions in assembly.

First steps towards common calling conventions

:bulb: TL;DR: The official calling conventions are here on the official RISC-V website. You can also find the short version on the RISC-V card that you will have access to in the exam.

If the above explanation did not fully click for you yet, don’t worry. Let’s go step by step and understand why conventions are useful and help us write code that can be used across programs and teams. Let’s take a look at the partial assembly code below. We already created a main and a sum label:

.globl main
.data
    a: .word 1
    b: .word 2
    number: .word 0

.text
main:
    # Load the two numbers a and b into registers

sum:
    # Add the two numbers and put them in a register for main to find.

resume:
    la t6, number
    sw a0, 0(t6)

:pencil: The instruction sw a0, 0(t6) means the following: take the memory address contained in the register t6, add 0 to it, and store the value from a0 at that memory location. You can of course substitute 0 for other integer values, adding zero is equivalent to writing sw a0, (t6).

This is useful if you want to access variables that are located next to each other in memory, and especially for iterating over arrays. Using this technique, you can iterate over a fixed-length array without having to change any registers. If the address of the start of the (integer) array is stored in t0, you can overwrite different elements of the array by changing this offset:

li t3, 5  # we want to overwrite the first three elements of the array with 5
sw t3, 0(t0)  # array[0] = 5
sw t3, 4(t0)  # array[1] = 5
sw t3, 8(t0)  # array[2] = 5

Here we change the offset by 4 bytes each time because integers take up 4 bytes in memory.

:fire: Warm-up 1: Extend the program above to sum the two numbers a and b and store it in number.

Solution

.globl main
.data
    a: .word 1
    b: .word 2
    number:  .word 0

.text
main:
    # Load the two numbers a and b into registers
    lw t0, a
    lw t1, b

sum:
    # Add the two numbers and put them in a register for main to find.
    add a0, t0, t1

resume:
    la t6, number
    sw a0, 0(t6)

Problems that arise when using functions in assembly code

Which registers did you use in the main to load a and b? What would happen to your program if the code in sum used different registers? It would no longer work. This is why the calling conventions of RISC-V assign different roles for different registers:

  • Some registers are to be used for input parameters and return values of functions
  • Callee-saved registers are to be restored by the called function at the end of its execution (if it had changed their values). In other words, the code that called the function can safely treat these registers as if their value was unchanged by the function call.
  • Caller-saved registers can be overwritten by the called function. In other words, the code that called the function should assume that these registers will be overwritten by the called function (and should be saved elsewhere if they are needed).
Register number Register name Use Note
x0 zero Always zero -
x1 - x4 ra, sp, gp, tp Various uses Partially explained below
x5 - x7 and x28-x31 t0 - t6 Temporary registers Caller must save these registers before calling a function it they need them later. Functions may at any time overwrite these registers!
x8 - x9 and x18 - x27 s0 - s11 Saved registers Callee must save these registers. Thus, you can safely use these registers in your code and any function that you call must back them up and restore them if they decide to use them too.
x10 - x17 a0 - a7 Function arguments Function arguments to called functions. Used as input parameters.
x10 and x11 a0 and a1 Return values Since the input to a function is not useful anymore on return, a0 and a1 have a dual use as registers for the return values.
addi s0, zero, 5
addi t0, zero, 5
jal fact         # function call!
add s1, t0, t0   # we don't know what is in t0, because it's caller-save!
add s1, s0, s0   # s0 is guaranteed to still contain 5, because it's callee-save!

If you take a look at the RISC-V card linked at the top of the website, you can now understand the register table.

Calling conventions solve interface issues

:fire: Warm-up 2: Change your program to adhere to the register conventions above as if the sum label was a function.

Solution

.globl main
.data
    a: .word 1
    b: .word 2
    number:  .word 0

.text
main:
    # Load the two numbers a and b into registers
	la a0, a
	la a1, b

sum:
    # Add the two numbers and put them in a register for main to find.
	add a0, a0, a1

resume:
    la t6, number
    sw a0, 0(t6)

:fire: Warm-up 3: Read the official calling conventions of RISC-V. Since we will not be making use of floating point registers (ft0 to ft11) you can skip Section 18.3. In this course we will just try to fit all parameters into the argument registers as it is explained there before utilizing the stack.

The official calling conventions talk about passing some function parameters via the stack. Let us now take a look at what happens if we run out of registers and how can we solve this by using the stack.

Memory: the stack

It is easy to think of cases where the 32 registers we have are not enough. You may already have come to a situation where this is the case, but for any larger program, we definitely need to store data in memory. Similarly, if the calling conventions only define 8 registers to pass function arguments, what happens if we want to pass more data to a function than fits into these 8 registers? You already worked with variables stored in the .data section: in a similar way, we could declare a region in memory that we use to back up or restore data from. However, defining specific variables is still not enough for cases where we do not know how many variables we will need. In this case, what we need is a data structure where we dynamically add and remove data, depending on what we need.

Understanding the stack

A stack is a simple data structure that grows in one direction and that works according to the Last-in-First-Out (LIFO) principle. The idea is as simple as a tower of books where you are only ever able to pick up the top-most book. You can place more books on top of the tower, but have to pick them up again to reach the books at the bottom. To realize a simple stack in assembly, we can just define a large memory region in the data section like this:

.globl main
.data
    stack: .space 500
.text
main:
    # ...

Now to use this stack, we would do the following actions:

  1. Load the address of the stack into a register
  2. When we want to push data on the stack:
    1. Store data at the address that the register points to
    2. Increment the pointer by the size of the data we just added, so that the register points to free space again
  3. When we want to pop data from the stack, we:
    1. Decrement the pointer by the size of the last data element
    2. Read the content of the data stored at the current stack pointer

If you are now unsure about the size of the data that you want to put or retrieve from the stack, take a look again at the calling conventions or at the RISC-V sheet, both contain a list of common data types and their size in bytes in our 32-bit RISC-V configuration.

:fire: Warm-up 4: Expand the example above with simple code that loads the address of the stack into a register, pushes two integers (4 and 5 for example) and then pops these integers again.

Solution

.globl main
.data
    stack: .space 500

.text
main:
	# load the stack pointer
	la t0, stack

	# load the integers
	li t1, 4
	li t2, 5

	# store the integers
	sw t1, 0(t0)
	sw t2, 4(t0)
	addi t0, t0, 8

	# load the integers again
	addi t0, t0, -4
	lw t2, 0(t0)
	addi t0, t0, -4
	lw t1, 0(t0)
	
	# Look at the code above:
	# First, we stored with different offsets into t0
	#  and incremented it by 8 (2x4).
	# But then when we loaded the integers, we decremented
	#  t0 by 4 twice and loaded without offset.
	# Both approaches are equivalent! You can use either 
	#  to achieve the same result!

This simple stack that you have written can already help you to overcome all challenges that we described above:

  • If we run out of registers, we can temporarily push the variables onto the stack. As long as we remember where we can find the data and how many variables we pushed after this variable, we can always find it again.
  • If we want to send complex data to a function or want to exceed the 8 registers that we can use to send function arguments, we can use such a stack and simply pass the pointer to the data on the stack.
  • One additional issue we did not discuss yet is the problem of program control flow. Where should the function jump at the end? What happens on recursion? Or if a function wants to call another function? The functions should always remember who called them and where to return to. This is handled by the return address register (ra / x1), which also needs to be backed up to the stack during nested function calls.

Manipulating the stack in RISC-V

At this point, it may not surprise you anymore to hear that RISC-V actually has a dedicated register called the stack pointer, and that the calling conventions heavily rely on the mechanisms mentioned above:

  • The stack pointer (sp / x2) is already set up for you to point to the CPU stack. You can use it in your programs freely.
  • In RISC-V (and other architectures) the stack grows downwards. Thus, instead of increasing the pointer as you did in the warm-up above, you decrement it. See the excursion below for more details.
  • To push data (e.g. t0) to the stack: first use addi sp, sp, −4 (decrement the stack pointer to allocate space on the stack); then use sw t0, 0(sp) to write t0 to the stack. The stack pointer should always point to the last pushed value (not to the empty space below it).
  • To pop data from the stack (e.g. in t0): first load the data at the top of the stack using lw t0, 0(sp); then update the stack pointer using addi sp, sp, 4.

:warning: The only deviation from the calling convention we make in this course is related to the following: “the stack pointer is always kept 16-byte aligned”. To keep the code examples simpler, we only align the stack pointer to 32-bit words (4 bytes) in this course (but of course, when writing real programs, you should keep this in mind). You are allowed to use either convention on the exam, as long as you use it consistently.

:fire: Warm-up 5: Change your code from the last warm-up to use the sp register and the provided stack.

Solution

.globl main

# We use the program stack which is already set up for us.
# Thus, we don't need the data variable we called stack anymore.
# We can just use the stack pointer.
.text
main:
    # load the stack pointer
    # sp already points to the stack! Nothing to do.

    # load the integers
    li t1, 4
    li t2, 5

    # store the integers
    # WARNING! We are using the stack pointer here!
    # This means that we need to make space on the stack
    #  BEFORE we store the integers.
    addi sp, sp, -8 # Stack grows downwards
    sw t1, 0(sp)
    sw t2, 4(sp)

    # load the integers again
    lw t1, 0(sp)
    addi sp, sp, 4 # Stack grows downwards and we move it after we loaded the integers
    lw t2, 0(sp)
    addi sp, sp, 4

Excursion: The stack grows downwards!?

:warning: The stack controlled by the stack pointer in RISC-V grows downwards! It is still a stack, but instead of adding a variable on top of the older one, you place the variable below the older one in memory. This means that when your last element on the stack is at address 0x100, the next item of size 4 bytes will be at address 0x100 - 0x004 = 0x0FC (and not at 0x104).

If you want to understand more about this, read through the excursion below.

Excursion: Why does the stack grow downwards?

There is no technical reason for this, this is just convention. You could equally well define that the stack grows upwards in the direction that is probably more intuitive to you. However, this decision may make more sense when looking at the general, complete memory layout of RISC-V.

Since RISC-V is based on a Von Neumann architecture, the memory is used to store both instructions and data. Take a look at the memory overview below. You will see that in the memory, we need to store (from bottom to top, low address to high):

  1. Reserved memory at the very first addresses (bottom in the graphic, from address 0x00).
  2. Text section, i.e., instructions that are kept in memory (your program).
  3. Data section where all static data is kept.
  4. Heap data for dynamic memory. You will learn more about this in the next session.
  5. Stack.

Both the stack and the heap can grow during the execution of programs. Since we may not always know how large both can get during execution, it is simpler to just have them share a big block of memory and let them grow towards each other. Then, we only need to make sure they never cross each other, but until that happens, both can grow and shrink however they want.

Memory Layout in RISC-V

:bulb: Why does the stack grow downwards? Because it became a convention at some point that the stack grows downwards and the heap grows upwards. There are some reasons that make it a smart choice, but nowadays it is just a convention in RISC-V.

Using the stack for function calls

Calling a function means multiple things at once:

  1. We jump to a new address (or label) and expect the code there to execute.
  2. We may want to provide some input to the called function.
  3. We may expect some return values from the called function.
  4. The function is expected to jump back (i.e., return), to the code that originally called it.
  5. Any registers that are marked as caller-save must be saved before the function call and restored afterwards (if we want to preserve their value).
  6. Any registers that are marked as callee-save must remain untouched by the called function.

We have already discussed which registers are callee- and caller-saved. However, we did not discuss how these registers should be saved on the stack or in what order we should save them. The first thing to do when calling a function is to save the caller-saved registers on the stack. Since this procedure logically belongs to the caller function, it does not matter in which order you save the registers as long as you make sure that you restore them in the same order after returning from the function.

The next step to take when calling a function is to push on the stack the arguments that do not fit into registers. Before jumping to the function, we fill the ra register with the return address: the address of the instruction that we want to execute when returning from the function. Most often, this will be the instruction after the jump to the function. This is why there is a special instruction in RISC-V, called the jal (jump and link) instruction, that first places the address of the next instruction into the ra register and then jumps to the given address.

Summary: Complete calling conventions

The complete list of things to do when calling a function is as follows:

  1. Push caller-saved registers that you need to reuse to the stack (e.g. if you only reuse t0 and t1 you need to save them but you don’t need to save t2-7).
  2. Place all function parameters that fit into registers in a0 to a7.
  3. Push remaining function parameters onto the stack.
  4. Call the function either via jal or via another instruction as long as you make sure to store the return address to ra.

In the called function, do the following:

  1. Back up the return address register ra on the stack (if we’re planning to overwrite it either directly or by calling another function with jal from within the function).
  2. Back up other callee-saved registers that will be used by this function (e.g. if we do not touch s0-s7, we can skip saving them).
  3. Perform function tasks.
  4. Place function return in a0 and a1.
  5. Restore callee-saved registers and ra.
  6. Return to parent function via ret (or simply jump to ra).

With this complete calling conventions list, you should now understand the call stack diagram, which you can also find on the RISC-V card:

Call stack diagram

Below, you can find a series of images that walk you through an example program and the usage of the stack (click or slide through the images).

Show stack example code
.data
    x: .word 1
    y: .word 2
    z: .space 4
.globl main
.text
foo:
    add a0, a0, a1
    ret

bar:
    addi sp, sp, -4
    sw   ra, 0(sp)
    jal  foo
    lw   ra, 0(sp)
    addi sp, sp, 4
    ret

main:
    lw  a0, x
    lw  a1, y
    jal bar
    sw  a0, z, t0

Exercise 1

Convert the following code to RISC-V assembly using RISC-V calling conventions. Assume that main does not return like common functions. Why is this assumption necessary right now?

int x = 10;
int y = 20;
int z;

int double_it(int a) {
    return a + a;
}

int sum(int a, int b) {
    return a + double_it(b);
}

int main(void) {
    z = sum(x, y);
}

Solution

.data
    x: .word 0x10
    y: .word 0x20
    z: .space 4

.text

.globl main
doubleIt:
    # No need to execute the entire prologue/epilogue
    # For such simple functions
    add a0, a0, a0
    ret
sum:
    addi sp, sp, -8 # Make space for 2 words on the stack
    sw ra, 4(sp)    # Save the return address
    sw a0, 0(sp)    # Save the first argument a0 because of doubleIt call

    mv a0, a1       # b (currently in a1) is our input. Move it to a0 for doubleIt.
    jal doubleIt

    lw t0, 0(sp)    # Restore the saved argument (a) to t0
    add a0, t0, a0  # Add a (t0) to the result of doubleIt (a0)

    lw ra, 4(sp)    # Restore the return address
    addi sp, sp, 8  # Restore stack pointer
    ret
main:
    lw a0, x
    lw a1, y
    jal sum
    sw a0, z, t0

Additional exercises

Exercise 2

We have compiled the function func from the code example below to 32-bit RISC-V (RV32I) using gcc. The compiled function can be found below. Translate the main function manually to RISC-V. Follow the calling conventions to pass all arguments correctly.

For this exercise, you will have to rely on the official RISC-V calling conventions. They define the size of the C types and how to pass values that do not fit in a single register!

:bulb: The nice thing about calling conventions is that you don’t have to understand the assembly code of func to be able to write the assembly code of main!

long long result;
long long i = 6;

long long func(int a, long b, long long c, long long d, long long e, long long *f) {
    return a + b + c + d + e + *f;
}

int main(void) {
    result = func(1, 2, 3, 4, 5, &i);
}

.data
    result:	.space 8
    i:		.dword 6

.text
.globl main
func:
    addi	sp,sp,-48
    sw	s0,44(sp)
    addi	s0,sp,48
    sw	a0,-20(s0)
    sw	a1,-24(s0)
    sw	a2,-32(s0)
    sw	a3,-28(s0)
    sw	a4,-40(s0)
    sw	a5,-36(s0)
    sw	a6,-48(s0)
    sw	a7,-44(s0)
    lw	a4,-20(s0)
    lw	a5,-24(s0)
    add	a5,a4,a5
    mv	t1,a5
    srai	a5,a5,0x1f
    mv	t2,a5
    lw	a2,-32(s0)
    lw	a3,-28(s0)
    add	a4,t1,a2
    mv	a1,a4
    sltu	a1,a1,t1
    add	a5,t2,a3
    add	a3,a1,a5
    mv	a5,a3
    mv	a2,a4
    mv	a3,a5
    lw	a0,-40(s0)
    lw	a1,-36(s0)
    add	a4,a2,a0
    mv	a6,a4
    sltu	a6,a6,a2
    add	a5,a3,a1
    add	a3,a6,a5
    mv	a5,a3
    mv	a2,a4
    mv	a3,a5
    lw	a0,-48(s0)
    lw	a1,-44(s0)
    add	a4,a2,a0
    mv	a6,a4
    sltu	a6,a6,a2
    add	a5,a3,a1
    add	a3,a6,a5
    mv	a5,a3
    mv	a2,a4
    mv	a3,a5
    lw	a5,0(s0)
    lw	a0,0(a5)
    lw	a1,4(a5)
    add	a4,a2,a0
    mv	a6,a4
    sltu	a6,a6,a2
    add	a5,a3,a1
    add	a3,a6,a5
    mv	a5,a3
    mv	a0,a4
    mv	a1,a5
    lw	s0,44(sp)
    addi	sp,sp,48
    ret

main:
    # TODO

Solution

.data
    result: .space 8
    i:      .dword 6

.text
.globl main
func:
    addi	sp,sp,-48
    sw	s0,44(sp)
    addi	s0,sp,48
    sw	a0,-20(s0)
    sw	a1,-24(s0)
    sw	a2,-32(s0)
    sw	a3,-28(s0)
    sw	a4,-40(s0)
    sw	a5,-36(s0)
    sw	a6,-48(s0)
    sw	a7,-44(s0)
    lw	a4,-20(s0)
    lw	a5,-24(s0)
    add	a5,a4,a5
    mv	t1,a5
    srai	a5,a5,0x1f
    mv	t2,a5
    lw	a2,-32(s0)
    lw	a3,-28(s0)
    add	a4,t1,a2
    mv	a1,a4
    sltu	a1,a1,t1
    add	a5,t2,a3
    add	a3,a1,a5
    mv	a5,a3
    mv	a2,a4
    mv	a3,a5
    lw	a0,-40(s0)
    lw	a1,-36(s0)
    add	a4,a2,a0
    mv	a6,a4
    sltu	a6,a6,a2
    add	a5,a3,a1
    add	a3,a6,a5
    mv	a5,a3
    mv	a2,a4
    mv	a3,a5
    lw	a0,-48(s0)
    lw	a1,-44(s0)
    add	a4,a2,a0
    mv	a6,a4
    sltu	a6,a6,a2
    add	a5,a3,a1
    add	a3,a6,a5
    mv	a5,a3
    mv	a2,a4
    mv	a3,a5
    lw	a5,0(s0)
    lw	a0,0(a5)
    lw	a1,4(a5)
    add	a4,a2,a0
    mv	a6,a4
    sltu	a6,a6,a2
    add	a5,a3,a1
    add	a3,a6,a5
    mv	a5,a3
    mv	a0,a4
    mv	a1,a5
    lw	s0,44(sp)
    addi	sp,sp,48
    ret

main:
    li a0, 1
    li a1, 2

    # c = a3 (high order bits) | a2 (low order bits)
    li a2, 3
    li a3, 0

    # d = a5 | a4
    li a4, 4
    li a5, 0

    # e = a6 | a5
    li a6, 5
    li a7, 0

    # t0 = &i
    la t0, i

    # push t0 (&i)
    addi sp, sp, -4
    sw   t0, 0(sp)

    # result = func(....)
    jal func
    sw  a0, result, t0

    # pop
    addi sp, sp, 4

Exercise 3

Fix the function sum_fixme in the below program. Only add code at the designated TODO points, don’t modify the existing code. Use the stack to make sure caller-saved registers are saved by the caller and callee-saved registers are saved by the callee. Note that sum_fixme acts both as a caller and a callee at different times. Your solution is correct if the execution terminates with

  • no errors;
  • the value 3 in a0;
  • the value 0xdeadbeef in s0.
.globl main
.text
callersaveregisterwipe: # Don't modify this function!
    mv t0, ra
    li ra, 0
    li t1, 0
    li t2, 0
    li t3, 0
    li t4, 0
    li t5, 0
    li t6, 0
    li a0, 0
    li a1, 0
    li a2, 0
    li a3, 0
    li a4, 0
    li a5, 0
    li a6, 0
    li a7, 0
    jr t0

sum_fixme:

    # TODO

    jal callersaveregisterwipe # Don't modify this line

    # TODO

    add s0, a0, a1  # Don't modify this line
    mv  a0, s0      # Don't modify this line

    # TODO

    ret # Don't modify this line

main: # Don't modify this function!
    li a0, 1
    li a1, 2
    li s0, 0xdeadbeef
    jal sum_fixme
    # Correct execution should terminate 1) Without errors, 2) with the value 3 in a0 AND 3) with the value 0xdeadbeef in s0

Solution

.globl main
.text
callersaveregisterwipe: #Don't modify this function!
    mv t0, ra
    li ra, 0
    li t1, 0
    li t2, 0
    li t3, 0
    li t4, 0
    li t5, 0
    li t6, 0
    li a0, 0
    li a1, 0
    li a2, 0
    li a3, 0
    li a4, 0
    li a5, 0
    li a6, 0
    li a7, 0
    jr t0

sum_fixme:

    addi sp, sp, -12 #Reserve 3 words on the stack

    sw   ra, 0(sp)   #Store caller-save registers on stack
    sw   a0, 4(sp)
    sw   a1, 8(sp)

    jal callersaveregisterwipe  #Don't modify this line

    lw   ra, 0(sp)   #Restore caller-save registers from stack
    lw   a0, 4(sp)
    lw   a1, 8(sp)

    sw   s0, 0(sp)   #Store callee-save register on stack so it can be used

    add  s0, a0, a1  #Don't modify this line
    mv   a0, s0      #Don't modify this line

    lw   s0, 0(sp)   #Restore callee-save register from stack
    addi sp, sp, 12  #Restore (callee-save) stack pointer before returning

    ret              #Don't modify this line

main:                #Don't modify this function!
    li a0, 1
    li a1, 2
    li s0, 0xdeadbeef
    jal sum_fixme
    #Correct execution should terminate 1) Without errors, 2) with the value 3 in a0 AND 3) with the value 0xdeadbeef in s0

Exercise 4

Consider the following recursive function which calculates n!.

unsigned int fact(unsigned int n) {
    if (n < 2) return 1;
    return n*fact(n-1);
}
  1. Convert this function to RISC-V.
  2. Consider the call fact(3). What is the state of stack when it reaches its maximum size (at the deepest level of recursion)?
  3. In exercise 4 of the first session you implemented an iterative factorial function. Compare both factorial implementations in terms of memory usage. Which implementation do you prefer?

Solution

.globl main

fact:
    addi sp, sp, -8   # Make space for 2 words on the stack
    sw   ra, 4(sp)    # Store the (caller-save) return address
    sw   a0, 0(sp)    # Store the (caller-save) register a0

    # if (n < 2)
    slti t0, a0, 2    # Set t0 to 1 if a0 (n) < 2
    beqz t0, _recurse # Branch to _recurse if a0 > 1 (because then t0 would be 0 now)

    # return 1;
    li   a0, 1        # Return 1 if a0 <= 1
    addi sp, sp, 8    # Free 2 words of space on stack
    ret

_recurse:
    #fact(n-1)
    addi a0, a0, -1   # Decrement $a0
    jal  fact         # Execute fact

    mv   t0, a0       # Move fact(n-1) result to t0
    lw   a0, 0(sp)    # Restore initial a0 (old n value)
    lw   ra, 4(sp)    # Restore initial ra
    addi sp, sp, 8    # Free 2 words of space on stack

    #n*fact(n-1);
    mul  a0, a0, t0   # a0 = a0 (n) * t0 (fact(n-1))
    ret               # return n * fact(n-1)

main:
    li  a0, 5         # Input will be 5
    jal fact          # a0 = fact(5);


Excursion: Tail recursion

A tail call occurs whenever the last instruction of a subroutine (before the return) calls a different subroutine. Compilers can take advantage of tail calls to reduce memory usage. This is because for tail calls, no additional stack frame needs to be entered. Instead, we can simply overwrite the function parameters, jump to the function and execute from there by reusing the original function stack frame. This is possible since we do not expect to be returned to and instead refer to our original caller that is on our stack frame. Thus, when the (tail-) called function returns, it will not return to us but directly to the original code that called us.

The benefit of tail calls is that they are very light on stack usage. While non-tail recursion adds a stack frame for each recursion depth, tail recursion only uses a single stack frame for any recursion depth.

:bulb: The call fact(n-1) in the previous exercise is not a tail call. Why not?

Solution

The multiplication must be performed after the recursive function returns. Thus, the recursive function call is not the last instruction in the function.

Excursion exercise

We have converted the factorial program to use tail recursion. Translate this program to RISC-V. Try to avoid using the call stack during the fact_tail implementation. Why is this possible?

unsigned int fact_tail(unsigned int n, unsigned int result) {
    if (n <= 1) return result;
    return fact_tail(n - 1, n * result);
}

unsigned int fact(unsigned int n) {
    return fact_tail(n, 1);
}

int main(void) {
    int n = 5;
    int r;
    r = fact(n);
}

Solution

.globl main

fact_tail:
    # if (n < 2)
    slti	t0, a0, 2       # Set t0 to 1 if a0 (n) < 2
    beqz	t0, _recurse    # Branch to _recurse if a0 > 1 (because then t0 would be 0 now)
    
    # return result;
    mv a0, a1
    ret
    
_recurse:
    #fact_tail(n - 1, n * result);
    mul 	a1, a0, a1  # a1 = n * result
    addi	a0, a0, -1  # a0 = n - 1
    j	fact_tail       # a0 = fact_tail(n - 1, n * result)
        # Note: NO JAL here -> ra is not overwritten
        # We don't need to use jal since there is no code to be executed after the tail call!
        # Thus no need to back-up ra on the stack at all.
    
fact:
    li 	a1, 1
    addi sp, sp, -4
    sw	ra, 0(sp)
    jal 	fact_tail
    lw	ra, 0(sp)
    addi	sp, sp, 4
    ret
    
main:
    li	a0, 5   # Input will be 5
    jal	fact    # a0 = fact(5);