Table of contents
- Introduction
- First steps towards common calling conventions
- Memory: the stack
- Using the stack for function calls
- Summary: Complete calling conventions
- Additional exercises
- Excursion: Tail recursion
Introduction
In the last sessions, you have already gained some experience with assembly code in RISC-V and, by now, also have a basic understanding of C code. In this session, we will dive deeper into what difficulties arise when writing larger programs, how we can extend our registers by using memory, and how this helps us to write code elements that resemble functions as you know them from higher-level languages.
Recap: Functions in C
Let’s first take a look at functions in C. Below you see an example program with a helper function that calculates a simple sum.
sum.c | Console |
---|---|
You already know functions from other languages than C, they are very useful to bundle common tasks or make the program simpler to think about. In essence, functions allow for abstraction via parameters and return values.
But how can we do this in assembly? There is almost no concept of a function in assembly. If languages like C are compiled to assembly code, how do they transform their functions to assembly code then? The answer is simple and slightly difficult at the same time:
- Functions can be implemented using common low-level primitives such as labels, jumps, and registers. Doing this is simple, and you may already have done so intuitively during the last sessions.
-
However, making sure that you can always and deterministically reuse or call a function requires some effort. You may already have experienced that different developers use different registers to do the same thing. What if you use register
t0
to pass a parameter to a function, but the other developer expects the parameter int1
? What happens if we run out of registers? These and other situations require for a clear set of (calling) conventions that we expect everyone to adhere to - even (or especially) compilers! This is the slightly difficult part of functions in assembly.
First steps towards common calling conventions
TL;DR: The official calling conventions are here on the official RISC-V website. You can also find the short version on the RISC-V card that you will have access to in the exam.
If the above explanation did not fully click for you yet, don’t worry. Let’s go step by step and understand why conventions are useful and help us write code that can be used across programs and teams. Let’s take a look at the partial assembly code below. We already created a main
and a sum
label:
.globl main
.data
a: .word 1
b: .word 2
number: .word 0
.text
main:
# Load the two numbers a and b into registers
sum:
# Add the two numbers and put them in a register for main to find.
resume:
la t6, number
sw a0, 0(t6)
The instruction
sw a0, 0(t6)
means the following: take the memory address contained in the registert6
, add0
to it, and store the value froma0
at that memory location. You can of course substitute0
for other integer values, adding zero is equivalent to writingsw a0, (t6)
.This is useful if you want to access variables that are located next to each other in memory, and especially for iterating over arrays. Using this technique, you can iterate over a fixed-length array without having to change any registers. If the address of the start of the (integer) array is stored in
t0
, you can overwrite different elements of the array by changing this offset:li t3, 5 # we want to overwrite the first three elements of the array with 5 sw t3, 0(t0) # array[0] = 5 sw t3, 4(t0) # array[1] = 5 sw t3, 8(t0) # array[2] = 5
Here we change the offset by 4 bytes each time because integers take up 4 bytes in memory.
Warm-up 1: Extend the program above to sum the two numbers
a
andb
and store it innumber
.
Solution
.globl main
.data
a: .word 1
b: .word 2
number: .word 0
.text
main:
# Load the two numbers a and b into registers
lw t0, a
lw t1, b
sum:
# Add the two numbers and put them in a register for main to find.
add a0, t0, t1
resume:
la t6, number
sw a0, 0(t6)
Which registers did you use in the main to load a
and b
? What would happen to your program if the code in sum
used different registers? It would no longer work. This is why the calling conventions of RISC-V assign different roles for different registers:
- Some registers are to be used for input parameters and return values of functions
- Callee-saved registers are to be restored by the called function at the end of its execution (if it had changed their values). In other words, the code that called the function can safely treat these registers as if their value was unchanged by the function call.
- Caller-saved registers can be overwritten by the called function. In other words, the code that called the function should assume that these registers will be overwritten by the called function (and should be saved elsewhere if they are needed).
Register number | Register name | Use | Note |
---|---|---|---|
x0 | zero | Always zero | - |
x1 - x4 | ra, sp, gp, tp | Various uses | Partially explained below |
x5 - x7 and x28-x31
| t0 - t6 | Temporary registers | Caller must save these registers before calling a function it they need them later. Functions may at any time overwrite these registers! |
x8 - x9 and x18 - x27
| s0 - s11 | Saved registers | Callee must save these registers. Thus, you can safely use these registers in your code and any function that you call must back them up and restore them if they decide to use them too. |
x10 - x17 | a0 - a7 | Function arguments | Function arguments to called functions. Used as input parameters. |
x10 and x11
|
a0 and a1
| Return values | Since the input to a function is not useful anymore on return, a0 and a1 have a dual use as registers for the return values. |
addi s0, zero, 5
addi t0, zero, 5
jal fact # function call!
add s1, t0, t0 # we don't know what is in t0, because it's caller-save!
add s1, s0, s0 # s0 is guaranteed to still contain 5, because it's callee-save!
If you take a look at the RISC-V card linked at the top of the website, you can now understand the register table.
Warm-up 2: Change your program to adhere to the register conventions above as if the
sum
label was a function.
Solution
.globl main
.data
a: .word 1
b: .word 2
number: .word 0
.text
main:
# Load the two numbers a and b into registers
la a0, a
la a1, b
sum:
# Add the two numbers and put them in a register for main to find.
add a0, a0, a1
resume:
la t6, number
sw a0, 0(t6)
Warm-up 3: Read the official calling conventions of RISC-V. Since we will not be making use of floating point registers (
ft0
toft11
) you can skip Section 18.3. In this course we will just try to fit all parameters into the argument registers as it is explained there before utilizing the stack.
The official calling conventions talk about passing some function parameters via the stack. Let us now take a look at what happens if we run out of registers and how can we solve this by using the stack.
Memory: the stack
It is easy to think of cases where the 32 registers we have are not enough. You may already have come to a situation where this is the case, but for any larger program, we definitely need to store data in memory. Similarly, if the calling conventions only define 8 registers to pass function arguments, what happens if we want to pass more data to a function than fits into these 8 registers? You already worked with variables stored in the .data
section: in a similar way, we could declare a region in memory that we use to back up or restore data from. However, defining specific variables is still not enough for cases where we do not know how many variables we will need. In this case, what we need is a data structure where we dynamically add and remove data, depending on what we need.
Understanding the stack
A stack is a simple data structure that grows in one direction and that works according to the Last-in-First-Out (LIFO) principle. The idea is as simple as a tower of books where you are only ever able to pick up the top-most book. You can place more books on top of the tower, but have to pick them up again to reach the books at the bottom. To realize a simple stack in assembly, we can just define a large memory region in the data section like this:
.globl main
.data
stack: .space 500
.text
main:
# ...
Now to use this stack, we would do the following actions:
- Load the address of the stack into a register
- When we want to push data on the stack:
- Store data at the address that the register points to
- Increment the pointer by the size of the data we just added, so that the register points to free space again
- When we want to pop data from the stack, we:
- Decrement the pointer by the size of the last data element
- Read the content of the data stored at the current stack pointer
If you are now unsure about the size of the data that you want to put or retrieve from the stack, take a look again at the calling conventions or at the RISC-V sheet, both contain a list of common data types and their size in bytes in our 32-bit RISC-V configuration.
Warm-up 4: Expand the example above with simple code that loads the address of the stack into a register, pushes two integers (4 and 5 for example) and then pops these integers again.
Solution
.globl main
.data
stack: .space 500
.text
main:
# load the stack pointer
la t0, stack
# load the integers
li t1, 4
li t2, 5
# store the integers
sw t1, 0(t0)
sw t2, 4(t0)
addi t0, t0, 8
# load the integers again
addi t0, t0, -4
lw t2, 0(t0)
addi t0, t0, -4
lw t1, 0(t0)
# Look at the code above:
# First, we stored with different offsets into t0
# and incremented it by 8 (2x4).
# But then when we loaded the integers, we decremented
# t0 by 4 twice and loaded without offset.
# Both approaches are equivalent! You can use either
# to achieve the same result!
This simple stack that you have written can already help you to overcome all challenges that we described above:
- If we run out of registers, we can temporarily push the variables onto the stack. As long as we remember where we can find the data and how many variables we pushed after this variable, we can always find it again.
- If we want to send complex data to a function or want to exceed the 8 registers that we can use to send function arguments, we can use such a stack and simply pass the pointer to the data on the stack.
- One additional issue we did not discuss yet is the problem of program control flow. Where should the function jump at the end? What happens on recursion? Or if a function wants to call another function? The functions should always remember who called them and where to return to. This is handled by the return address register (
ra
/x1
), which also needs to be backed up to the stack during nested function calls.
Manipulating the stack in RISC-V
At this point, it may not surprise you anymore to hear that RISC-V actually has a dedicated register called the stack pointer, and that the calling conventions heavily rely on the mechanisms mentioned above:
- The stack pointer (
sp
/x2
) is already set up for you to point to the CPU stack. You can use it in your programs freely. - In RISC-V (and other architectures) the stack grows downwards. Thus, instead of increasing the pointer as you did in the warm-up above, you decrement it. See the excursion below for more details.
- To push data (e.g.
t0
) to the stack: first useaddi sp, sp, −4
(decrement the stack pointer to allocate space on the stack); then usesw t0, 0(sp)
to writet0
to the stack. The stack pointer should always point to the last pushed value (not to the empty space below it). - To pop data from the stack (e.g. in
t0
): first load the data at the top of the stack usinglw t0, 0(sp)
; then update the stack pointer usingaddi sp, sp, 4
.
The only deviation from the calling convention we make in this course is related to the following: “the stack pointer is always kept 16-byte aligned”. To keep the code examples simpler, we only align the stack pointer to 32-bit words (4 bytes) in this course (but of course, when writing real programs, you should keep this in mind). You are allowed to use either convention on the exam, as long as you use it consistently.
Warm-up 5: Change your code from the last warm-up to use the
sp
register and the provided stack.
Solution
.globl main
# We use the program stack which is already set up for us.
# Thus, we don't need the data variable we called stack anymore.
# We can just use the stack pointer.
.text
main:
# load the stack pointer
# sp already points to the stack! Nothing to do.
# load the integers
li t1, 4
li t2, 5
# store the integers
# WARNING! We are using the stack pointer here!
# This means that we need to make space on the stack
# BEFORE we store the integers.
addi sp, sp, -8 # Stack grows downwards
sw t1, 0(sp)
sw t2, 4(sp)
# load the integers again
lw t1, 0(sp)
addi sp, sp, 4 # Stack grows downwards and we move it after we loaded the integers
lw t2, 0(sp)
addi sp, sp, 4
Excursion: The stack grows downwards!?
The stack controlled by the stack pointer in RISC-V grows downwards! It is still a stack, but instead of adding a variable on top of the older one, you place the variable below the older one in memory. This means that when your last element on the stack is at address
0x100
, the next item of size 4 bytes will be at address0x100 - 0x004 = 0x0FC
(and not at0x104
).
If you want to understand more about this, read through the excursion below.
Excursion: Why does the stack grow downwards?
There is no technical reason for this, this is just convention. You could equally well define that the stack grows upwards in the direction that is probably more intuitive to you. However, this decision may make more sense when looking at the general, complete memory layout of RISC-V.
Since RISC-V is based on a Von Neumann architecture, the memory is used to store both instructions and data. Take a look at the memory overview below. You will see that in the memory, we need to store (from bottom to top, low address to high):
- Reserved memory at the very first addresses (bottom in the graphic, from address 0x00).
- Text section, i.e., instructions that are kept in memory (your program).
- Data section where all static data is kept.
- Heap data for dynamic memory. You will learn more about this in the next session.
- Stack.
Both the stack and the heap can grow during the execution of programs. Since we may not always know how large both can get during execution, it is simpler to just have them share a big block of memory and let them grow towards each other. Then, we only need to make sure they never cross each other, but until that happens, both can grow and shrink however they want.
Why does the stack grow downwards? Because it became a convention at some point that the stack grows downwards and the heap grows upwards. There are some reasons that make it a smart choice, but nowadays it is just a convention in RISC-V.
Using the stack for function calls
Calling a function means multiple things at once:
- We jump to a new address (or label) and expect the code there to execute.
- We may want to provide some input to the called function.
- We may expect some return values from the called function.
- The function is expected to jump back (i.e., return), to the code that originally called it.
- Any registers that are marked as caller-save must be saved before the function call and restored afterwards (if we want to preserve their value).
- Any registers that are marked as callee-save must remain untouched by the called function.
We have already discussed which registers are callee- and caller-saved. However, we did not discuss how these registers should be saved on the stack or in what order we should save them. The first thing to do when calling a function is to save the caller-saved registers on the stack. Since this procedure logically belongs to the caller function, it does not matter in which order you save the registers as long as you make sure that you restore them in the same order after returning from the function.
The next step to take when calling a function is to push on the stack the arguments that do not fit into registers. Before jumping to the function, we fill the ra
register with the return address: the address of the instruction that we want to execute when returning from the function. Most often, this will be the instruction after the jump to the function. This is why there is a special instruction in RISC-V, called the jal
(jump and link) instruction, that first places the address of the next instruction into the ra
register and then jumps to the given address.
Summary: Complete calling conventions
The complete list of things to do when calling a function is as follows:
- Push caller-saved registers that you need to reuse to the stack (e.g. if you only reuse
t0
andt1
you need to save them but you don’t need to savet2-7
).- Place all function parameters that fit into registers in
a0
toa7
.- Push remaining function parameters onto the stack.
- Call the function either via
jal
or via another instruction as long as you make sure to store the return address tora
.In the called function, do the following:
- Back up the return address register
ra
on the stack (if we’re planning to overwrite it either directly or by calling another function withjal
from within the function).- Back up other callee-saved registers that will be used by this function (e.g. if we do not touch
s0-s7
, we can skip saving them).- Perform function tasks.
- Place function return in
a0
anda1
.- Restore callee-saved registers and
ra
.- Return to parent function via
ret
(or simply jump tora
).
With this complete calling conventions list, you should now understand the call stack diagram, which you can also find on the RISC-V card:
Below, you can find a series of images that walk you through an example program and the usage of the stack (click or slide through the images).
Show stack example code
.data
x: .word 1
y: .word 2
z: .space 4
.globl main
.text
foo:
add a0, a0, a1
ret
bar:
addi sp, sp, -4
sw ra, 0(sp)
jal foo
lw ra, 0(sp)
addi sp, sp, 4
ret
main:
lw a0, x
lw a1, y
jal bar
sw a0, z, t0
Exercise 1
Convert the following code to RISC-V assembly using RISC-V calling conventions. Assume that main
does not return like common functions. Why is this assumption necessary right now?
int x = 10;
int y = 20;
int z;
int double_it(int a) {
return a + a;
}
int sum(int a, int b) {
return a + double_it(b);
}
int main(void) {
z = sum(x, y);
}
Solution
.data
x: .word 0x10
y: .word 0x20
z: .space 4
.text
.globl main
doubleIt:
# No need to execute the entire prologue/epilogue
# For such simple functions
add a0, a0, a0
ret
sum:
addi sp, sp, -8 # Make space for 2 words on the stack
sw ra, 4(sp) # Save the return address
sw a0, 0(sp) # Save the first argument a0 because of doubleIt call
mv a0, a1 # b (currently in a1) is our input. Move it to a0 for doubleIt.
jal doubleIt
lw t0, 0(sp) # Restore the saved argument (a) to t0
add a0, t0, a0 # Add a (t0) to the result of doubleIt (a0)
lw ra, 4(sp) # Restore the return address
addi sp, sp, 8 # Restore stack pointer
ret
main:
lw a0, x
lw a1, y
jal sum
sw a0, z, t0
Additional exercises
Exercise 2
We have compiled the function func
from the code example below to 32-bit RISC-V (RV32I) using gcc. The compiled function can be found below. Translate the main function manually to RISC-V. Follow the calling conventions to pass all arguments correctly.
For this exercise, you will have to rely on the official RISC-V calling conventions. They define the size of the C types and how to pass values that do not fit in a single register!
The nice thing about calling conventions is that you don’t have to understand the assembly code of
func
to be able to write the assembly code ofmain
!
long long result;
long long i = 6;
long long func(int a, long b, long long c, long long d, long long e, long long *f) {
return a + b + c + d + e + *f;
}
int main(void) {
result = func(1, 2, 3, 4, 5, &i);
}
.data
result: .space 8
i: .dword 6
.text
.globl main
func:
addi sp,sp,-48
sw s0,44(sp)
addi s0,sp,48
sw a0,-20(s0)
sw a1,-24(s0)
sw a2,-32(s0)
sw a3,-28(s0)
sw a4,-40(s0)
sw a5,-36(s0)
sw a6,-48(s0)
sw a7,-44(s0)
lw a4,-20(s0)
lw a5,-24(s0)
add a5,a4,a5
mv t1,a5
srai a5,a5,0x1f
mv t2,a5
lw a2,-32(s0)
lw a3,-28(s0)
add a4,t1,a2
mv a1,a4
sltu a1,a1,t1
add a5,t2,a3
add a3,a1,a5
mv a5,a3
mv a2,a4
mv a3,a5
lw a0,-40(s0)
lw a1,-36(s0)
add a4,a2,a0
mv a6,a4
sltu a6,a6,a2
add a5,a3,a1
add a3,a6,a5
mv a5,a3
mv a2,a4
mv a3,a5
lw a0,-48(s0)
lw a1,-44(s0)
add a4,a2,a0
mv a6,a4
sltu a6,a6,a2
add a5,a3,a1
add a3,a6,a5
mv a5,a3
mv a2,a4
mv a3,a5
lw a5,0(s0)
lw a0,0(a5)
lw a1,4(a5)
add a4,a2,a0
mv a6,a4
sltu a6,a6,a2
add a5,a3,a1
add a3,a6,a5
mv a5,a3
mv a0,a4
mv a1,a5
lw s0,44(sp)
addi sp,sp,48
ret
main:
# TODO
Solution
.data
result: .space 8
i: .dword 6
.text
.globl main
func:
addi sp,sp,-48
sw s0,44(sp)
addi s0,sp,48
sw a0,-20(s0)
sw a1,-24(s0)
sw a2,-32(s0)
sw a3,-28(s0)
sw a4,-40(s0)
sw a5,-36(s0)
sw a6,-48(s0)
sw a7,-44(s0)
lw a4,-20(s0)
lw a5,-24(s0)
add a5,a4,a5
mv t1,a5
srai a5,a5,0x1f
mv t2,a5
lw a2,-32(s0)
lw a3,-28(s0)
add a4,t1,a2
mv a1,a4
sltu a1,a1,t1
add a5,t2,a3
add a3,a1,a5
mv a5,a3
mv a2,a4
mv a3,a5
lw a0,-40(s0)
lw a1,-36(s0)
add a4,a2,a0
mv a6,a4
sltu a6,a6,a2
add a5,a3,a1
add a3,a6,a5
mv a5,a3
mv a2,a4
mv a3,a5
lw a0,-48(s0)
lw a1,-44(s0)
add a4,a2,a0
mv a6,a4
sltu a6,a6,a2
add a5,a3,a1
add a3,a6,a5
mv a5,a3
mv a2,a4
mv a3,a5
lw a5,0(s0)
lw a0,0(a5)
lw a1,4(a5)
add a4,a2,a0
mv a6,a4
sltu a6,a6,a2
add a5,a3,a1
add a3,a6,a5
mv a5,a3
mv a0,a4
mv a1,a5
lw s0,44(sp)
addi sp,sp,48
ret
main:
li a0, 1
li a1, 2
# c = a3 (high order bits) | a2 (low order bits)
li a2, 3
li a3, 0
# d = a5 | a4
li a4, 4
li a5, 0
# e = a6 | a5
li a6, 5
li a7, 0
# t0 = &i
la t0, i
# push t0 (&i)
addi sp, sp, -4
sw t0, 0(sp)
# result = func(....)
jal func
sw a0, result, t0
# pop
addi sp, sp, 4
Exercise 3
Fix the function sum_fixme
in the below program. Only add code at the designated TODO
points, don’t modify the existing code. Use the stack to make sure caller-saved registers are saved by the caller and callee-saved registers are saved by the callee. Note that sum_fixme
acts both as a caller and a callee at different times. Your solution is correct if the execution terminates with
- no errors;
- the value 3 in
a0
; - the value
0xdeadbeef
ins0
.
.globl main
.text
callersaveregisterwipe: # Don't modify this function!
mv t0, ra
li ra, 0
li t1, 0
li t2, 0
li t3, 0
li t4, 0
li t5, 0
li t6, 0
li a0, 0
li a1, 0
li a2, 0
li a3, 0
li a4, 0
li a5, 0
li a6, 0
li a7, 0
jr t0
sum_fixme:
# TODO
jal callersaveregisterwipe # Don't modify this line
# TODO
add s0, a0, a1 # Don't modify this line
mv a0, s0 # Don't modify this line
# TODO
ret # Don't modify this line
main: # Don't modify this function!
li a0, 1
li a1, 2
li s0, 0xdeadbeef
jal sum_fixme
# Correct execution should terminate 1) Without errors, 2) with the value 3 in a0 AND 3) with the value 0xdeadbeef in s0
Solution
.globl main
.text
callersaveregisterwipe: #Don't modify this function!
mv t0, ra
li ra, 0
li t1, 0
li t2, 0
li t3, 0
li t4, 0
li t5, 0
li t6, 0
li a0, 0
li a1, 0
li a2, 0
li a3, 0
li a4, 0
li a5, 0
li a6, 0
li a7, 0
jr t0
sum_fixme:
addi sp, sp, -12 #Reserve 3 words on the stack
sw ra, 0(sp) #Store caller-save registers on stack
sw a0, 4(sp)
sw a1, 8(sp)
jal callersaveregisterwipe #Don't modify this line
lw ra, 0(sp) #Restore caller-save registers from stack
lw a0, 4(sp)
lw a1, 8(sp)
sw s0, 0(sp) #Store callee-save register on stack so it can be used
add s0, a0, a1 #Don't modify this line
mv a0, s0 #Don't modify this line
lw s0, 0(sp) #Restore callee-save register from stack
addi sp, sp, 12 #Restore (callee-save) stack pointer before returning
ret #Don't modify this line
main: #Don't modify this function!
li a0, 1
li a1, 2
li s0, 0xdeadbeef
jal sum_fixme
#Correct execution should terminate 1) Without errors, 2) with the value 3 in a0 AND 3) with the value 0xdeadbeef in s0
Exercise 4
Consider the following recursive function which calculates n!
.
unsigned int fact(unsigned int n) {
if (n < 2) return 1;
return n*fact(n-1);
}
- Convert this function to RISC-V.
- Consider the call
fact(3)
. What is the state of stack when it reaches its maximum size (at the deepest level of recursion)? - In exercise 4 of the first session you implemented an iterative factorial function. Compare both factorial implementations in terms of memory usage. Which implementation do you prefer?
Solution
.globl main
fact:
addi sp, sp, -8 # Make space for 2 words on the stack
sw ra, 4(sp) # Store the (caller-save) return address
sw a0, 0(sp) # Store the (caller-save) register a0
# if (n < 2)
slti t0, a0, 2 # Set t0 to 1 if a0 (n) < 2
beqz t0, _recurse # Branch to _recurse if a0 > 1 (because then t0 would be 0 now)
# return 1;
li a0, 1 # Return 1 if a0 <= 1
addi sp, sp, 8 # Free 2 words of space on stack
ret
_recurse:
#fact(n-1)
addi a0, a0, -1 # Decrement $a0
jal fact # Execute fact
mv t0, a0 # Move fact(n-1) result to t0
lw a0, 0(sp) # Restore initial a0 (old n value)
lw ra, 4(sp) # Restore initial ra
addi sp, sp, 8 # Free 2 words of space on stack
#n*fact(n-1);
mul a0, a0, t0 # a0 = a0 (n) * t0 (fact(n-1))
ret # return n * fact(n-1)
main:
li a0, 5 # Input will be 5
jal fact # a0 = fact(5);
Excursion: Tail recursion
A tail call occurs whenever the last instruction of a subroutine (before the return) calls a different subroutine. Compilers can take advantage of tail calls to reduce memory usage. This is because for tail calls, no additional stack frame needs to be entered. Instead, we can simply overwrite the function parameters, jump to the function and execute from there by reusing the original function stack frame. This is possible since we do not expect to be returned to and instead refer to our original caller that is on our stack frame. Thus, when the (tail-) called function returns, it will not return to us but directly to the original code that called us.
The benefit of tail calls is that they are very light on stack usage. While non-tail recursion adds a stack frame for each recursion depth, tail recursion only uses a single stack frame for any recursion depth.
The call
fact(n-1)
in the previous exercise is not a tail call. Why not?
Solution
The multiplication must be performed after the recursive function returns. Thus, the recursive function call is not the last instruction in the function.
Excursion exercise
We have converted the factorial program to use tail recursion. Translate this program to RISC-V. Try to avoid using the call stack during the fact_tail
implementation. Why is this possible?
unsigned int fact_tail(unsigned int n, unsigned int result) {
if (n <= 1) return result;
return fact_tail(n - 1, n * result);
}
unsigned int fact(unsigned int n) {
return fact_tail(n, 1);
}
int main(void) {
int n = 5;
int r;
r = fact(n);
}
Solution
.globl main
fact_tail:
# if (n < 2)
slti t0, a0, 2 # Set t0 to 1 if a0 (n) < 2
beqz t0, _recurse # Branch to _recurse if a0 > 1 (because then t0 would be 0 now)
# return result;
mv a0, a1
ret
_recurse:
#fact_tail(n - 1, n * result);
mul a1, a0, a1 # a1 = n * result
addi a0, a0, -1 # a0 = n - 1
j fact_tail # a0 = fact_tail(n - 1, n * result)
# Note: NO JAL here -> ra is not overwritten
# We don't need to use jal since there is no code to be executed after the tail call!
# Thus no need to back-up ra on the stack at all.
fact:
li a1, 1
addi sp, sp, -4
sw ra, 0(sp)
jal fact_tail
lw ra, 0(sp)
addi sp, sp, 4
ret
main:
li a0, 5 # Input will be 5
jal fact # a0 = fact(5);