Table of contents
Introduction
After the more theory-oriented session 1, this time we will dive into more complicated exercises (after covering a few more concepts). Of course, if anything is unclear, you can check back with the previous exercise, ask your teaching assistant, or post in the Toledo forums!
Let’s start off by practicing everything you’ve learned about RISC-V assembly in the previous session.
Exercise 1
Take the following C code:
switch (operation) {
case 0: result = a + b; break;
case 1: result = a - b; break;
case 2: result = a + 5; break;
case 3: result = b + 5; break;
}
Convert this code into RISC-V assembly! Assume that the variables operation
, a
, and b
are integers stored in memory (the data section). You can store the result in the a0
register.
Tip: If at first you’re scared of the
switch
statement, don’t be! Think about how you would convert it into consecutiveif
statements in C.
Solution
.data
operation: .word 3 # change this between 0-3
a: .word 5
b: .word 1
.globl main
.text
main:
lw t0, operation # t0 = operation
lw t1, a # t1 = a
lw t2, b # t2 = b
mv t3, zero # t3: used to compare to t0
beq t0, t3, case0 # if t0 == t3: go to case 0
addi t3, t3, 1 # t3 += 1
beq t0, t3, case1
addi t3, t3, 1
beq t0, t3, case2
addi t3, t3, 1
beq t0, t3, case3
j end # if none of the cases matched
# don't do anything (jump over cases)
case0:
add a0, t1, t2
j end # jump to the end after each individual case
case1:
sub a0, t1, t2
j end
case2:
addi a0, t1, 5
j end
case3:
addi a0, t2, 5
j end
end:
Functions in C
In this session, we will briefly introduce functions in C, but the detailed description of how to translate them into assembly will be the topic of the third session.
You probably remember that every C program must contain the int main(void)
function, this is where the execution will start. Remember that the first int
means that the function will return an integer value, while (void)
shows that the function does not take any arguments.
You can use the same syntax to define your own functions in your file:
int difference(int, int);
int main(void) {
int x = 5;
int y = 13;
int d = difference(x, y);
// ...
}
int difference(int a, int b) {
if (a >= b) {
return a - b;
} else {
return b - a;
}
}
Function declarations
What is the role of the first line in the example above?
int difference(int, int);
The compiler moves linearly in the file, it needs to already know the signature (return type, parameters) of functions when using them (in main
or in other functions). For this reason, it’s sometimes necessary to include the function declaration at the top of the file.
In the example above, it would have also been fine to just move the function definition above main
, in which case you don’t need the declaration on the first line anymore. (But think of an example where two functions call each other, in that case one of them has to go above the other!)
When you include a header file, such as
#include <stdio.h>
, that file also contains a list of function declarations, just like this one! This is how your compiler knows the list of functions implemented by the standard library.
Exercise 2
Write a C program that computes the factorial of every number between 1 and 10. Avoid duplicating code (use loops and functions where applicable).
Solution
#include <stdio.h>
int fact(int n) {
if (n < 2) {
return 1;
} else {
return n * fact(n - 1);
}
}
int main(void) {
for (int i = 1; i <= 10; ++i) {
printf("The factorial of %d is %d\n", i, fact(i));
}
return 0;
}
Structs
We often want to group different variables into one object. For example if our program handles user data, and we want to store the name and age of each user, it’s a lot more convenient to group these two variables into one object.
This is possible in C using structs.
struct person {
int age;
char *name;
};
int main(void) {
struct person John;
John.age = 42;
John.name = "John";
// ...
}
You can refer to the variables inside the struct with the .
operator. These values are going to be placed in consecutive memory by the compiler, but sometimes the compiler leaves some empty space between the different fields (e.g., for optimization reasons). This is called padding.
If you have a pointer to a struct, you can still refer to the fields of the pointed object:
void birthday(struct person *sp) {
(*sp).age += 1;
}
This notation is a bit cumbersome, so you can replace it with the ->
operator:
void birthday(struct person *sp) {
sp->age += 1;
}
Think about this function! Would it work as intended if we passed the struct itself as the parameter, not a pointer to it? Try it out!
Solution
Whenever we call a function, the parameters passed to that function are copied. If we pass the struct itself, the function will operate on a copy of it. We can update the fields of this copy, but those changes will not be reflected in the original struct. This is why we need to pass a pointer (= create a copy of the pointer), because by pointing to the location of the original struct, we allow the function to update it.Exercise 3
Define a struct containing a float, double, long, integer pointer, and a single char. Print the size of the struct. Create a new instance of this struct and print the addresses of each field. Draw the memory layout chosen by the compiler. Did the compiler introduce padding? If so, where and how much?
Solution
#include <stdio.h>
struct st {
float f;
double d;
long l;
int *p;
char c;
};
int main(void) {
printf("The size of the struct is %lu bytes\n", sizeof(struct st));
struct st obj;
printf("f: %p\nd: %p\nl: %p\np: %p\nc: %p\n", &obj.f, &obj.d, &obj.l, &obj.p, &obj.c);
return 0;
}
The memory layout might be different on your computer, but here is an example execution:
The size of the struct is 40 bytes
f: 0x7ffe33ed6b20
d: 0x7ffe33ed6b28
l: 0x7ffe33ed6b30
p: 0x7ffe33ed6b38
c: 0x7ffe33ed6b40
To know whether padding was introduced, we need to know the real sizes of the data types. There’s one thing we can immediately notice: the size of a char is always 1 byte, and here it is the last element of the struct, placed at byte 0x..40
. But the struct has a size of 40 bytes, so the last byte that belongs to it in hexadecimal is 0x..20 + 0x27 (39 in decimal) = 0x..47
, meaning there are 7 unused bytes after the character.
Fixed-length arrays
We have already seen integer variables in both C and assembly. How can we create an array of many integers? Let’s say, for example, that we want to store how many classes we have on each day. We could of course just declare a variable for each day (int monday, tuesday, ...;
), but this would get out of hand quickly if we want to perform an operation on all variables.
We can instead create an array with a fixed number of elements:
int classes[7] = {4, 3, 1, 4, 2, 0, 0};
You might be wondering: isn’t it redundant to first declare that the array will contain 7 elements, then list 7 elements? You’re right, you can omit the number from between the square brackets:
int pair[] = {32, 64};
.You can also omit elements from the initializer list on the right side (if you first specify the length of the array):
int classes_two[7] = {4, 3, 1, 4, 2};
. This array will be the same as theclasses
array declared above. If you specify fewer elements in the list than the number on the left, the remaining elements will be initialized to zero.
Representation in memory
Arrays are a collection of homogeneous elements, and they are stored contiguously in memory. The array from above would have the following (partial) representation in memory:
Notice two things: first, in this example, int
values take up 4 bytes in memory: the first integer is stored in bytes 0x100-0x103
, the second in 0x104-0x107
, and so on. Second, this example uses big-endian notation for demonstration purposes, while the x86 and RISC-V architectures use little-endian representation.
When we create an array, the variable (classes
) itself will point to the first element of the array. It has some special properties, but for the purposes of this course we can think of it as a regular pointer. This means for example that we can copy it to another pointer or pass it to a function expecting a pointer argument:
int classes[7] = {4, 3, 1, 4, 2, 0, 0};
int *copy = classes; // Note: only the pointer is copied, not the entire array!
At the end of this example, classes
and copy
will refer to the same memory location, namely the beginning of the array, which contains the value 4.
Elements of the array
So now we know that the classes
variable points to the first element of the array. That means that if we want to read or write that value, we can just write *classes
, as we’ve done with other pointers.
But how can we access the other elements of the array? Since we know that they are placed next to each other in memory, we can simply add an offset to the starting pointer to access further elements! If we want to get the third element of the array, we can write *(classes + 2)
, because we know that we have to move 2 places to the right in memory compared to the first element.
Hold on! How come we increment the pointer by 1 for each element if the memory addresses increase by 4 for each
int
? Ifint
s take up 4 bytes in memory, we should also increment the pointer by 4 to get to the nextint
, right?Well, in C, when you increment an
int
pointer byN
, it will increment byN * sizeof(int)
so that it automatically points to the nextint
in memory. The same is true for other pointer types.Keep this in mind though for when you’re doing arithmetic with pointers in assembly, as there you will have to take care of this yourself!
So for example, if we want to change the value stored for Thursday, we can just write:
*(classes + 3) = 5;
This notation is a bit verbose, so C allows us to shorten it using the index notation:
classes[3] = 5;
Of course, you can also use a variable as an index for an array. If you want to iterate over all the elements of the array, this is a very handy way of doing it (keep in mind that the first element is at index 0
!):
for (int i = 0; i < 7; ++i) {
printf("%d ", classes[i]);
}
Where do arrays end?
With regular arrays, it’s problematic to detect how long an array is. If all the information we have available is the starting address, how can we tell where the last value of the array is?
In the example above, we explicitly iterated over 7 values, because we know that the week has 7 days. But what happens if we want to write a function that operates on arrays of varying sizes?
A possible option is to designate a special value as a termination indicator; if our arrays can only contain positive numbers, we can use a special 0 or negative number as the last value of the array. This way, we can just iterate over the values of the array until we encounter that special value. This, however, only works if the possible values are limited. If our array can contain any integer values, what can we do?
Exercise 4
Write a C program that prints the array 1, 2, 3, 4, 5
on a single line separated by spaces, together with the addresses of the elements. Ask the user for an integer, multiply all elements of the original array with this integer and print the array again. Now, define a function to print the array to avoid duplicating your code. Don’t hardcode the size of the array in the function: your function should work with various array sizes!
Hint: Don’t be discouraged if your solution looks ugly, you can ask the teaching assistant whether it’s correct! :)
Solution
#include <stdio.h>
void print_array(int *array, int size) {
for (int i = 0; i < size; ++i) {
printf("%d ", array[i]);
}
printf("\n");
}
int main(void) {
int array[] = {1, 2, 3, 4, 5};
int array_size = 5;
print_array(array, array_size);
int mul;
printf("The number to multiply with: ");
scanf("%d", &mul);
for (int i = 0; i < array_size; ++i) {
array[i] *= mul;
}
print_array(array, array_size);
return 0;
}
Strings
How can strings be represented in C? If you think about it, strings are just arrays of characters, that’s also how C handles them. One advantage of characters in the ASCII encoding (used by C) is that they have a limited set of valid values, so we will be able to use the approach outlined above for indicating the end of strings.
In C, we always use a null byte \0
to indicate the end of strings. All the functions in the standard library that operate on strings will expect to see a null byte at the end of the strings they receive as parameters.
That means that if we want to create a string containing the word hello
, we need to write the following:
char hello[] = {'h', 'e', 'l', 'l', 'o', '\0'};
Notice that this means that we use one extra byte to store the string, compared to the number of characters!
C also has special syntax for strings (that we have seen before) to make it easier to work with them.
char hello[] = "hello";
This will allocate the same array as the example before, complete with the terminating null byte.
Exercise 5
Write a C program that asks the user for a string and outputs the length of the string. You can use the function fgets
with the parameter stdin
to read a whole string from the console.
First, write a version using the function unsigned long strlen(char *)
declared in the header string.h
. Then create a second version where strlen
is not used. Note that the last character of the string will be the line feed (hex 0x0a
).
Solution
#include <stdio.h>
#include <string.h>
unsigned int mylen(char *s) {
unsigned int i;
for (i = 0; s[i] != '\0'; ++i) {
}
return i;
}
int main(void) {
char str[100]; // we need to have an arbitrary limit (100) for the length
printf("Enter a string: ");
fgets(str, 100, stdin); // and make sure we don't read more characters than that
printf("Length: %lu %u\n", strlen(str), mylen(str));
return 0;
}
Exercise 6
Modify the program above so that it prints the hexadecimal representation of each character in the string in order. Verify the output using an ASCII table. Hint: use the format string %02x
.
Solution
We add one extra line in our function:
unsigned int mylen(char *s) {
unsigned int i;
for (i = 0; s[i] != '\0'; ++i) {
printf("%02x\n", s[i]);
}
return i;
}
Arrays in assembly
In RISC-V assembly, there is no strong notion of arrays. But since we know that arrays are just consecutive values in memory, we can implement them the same way in assembly as they work in C. You can use comma-separated values to reserve consecutive words in memory:
.data
array: .word 1, 2, 3, 4
Exercise 7
Write a RISC-V program that multiplies all numbers in an array with a constant number without using the mul
instruction.
Solution
In this program, we move in reverse order of the elements to avoid explicitly needing to take care of which element we’re dealing with.
.data
array: .word 1, 2, 3, 4, 5
num: .word 0 # the number to multiply with
.globl main
.text
main:
la a0, array # a0 = array (address)
addi t0, zero, 20 # t0 = 20 (5 * 4, the size of the array)
iterate:
addi t0, t0, -4 # t0 = t0 - 4 (1 word = 4 bytes)
add t1, t0, a0 # t1 = t0 + a0 (current element's address)
lw t2, (t1) # t2 = *t1
lw t6, num # t6 = the number to multiply with
mv t4, zero # t4 = 0
multloop:
beqz t6, multend # while (t6 != 0) {
add t4, t4, t2 # t4 = t4 + t2
addi t6, t6, -1 # t6 -= 1
j multloop # }
multend:
sw t4, (t1) # *t1 = t4
bnez t0, iterate # if (t0 != 0) { goto mult; }
Strings in assembly
For strings, we again have special syntax:
.data
str: .string "hello"
This string notation behaves the same way as in C, complete with the terminating zero byte.
Exercise 8
Translate the C program calculating the length of a string without strlen
from exercise 5 to RISC-V. The string can be provided in the data section. The resulting length can be stored in register a0
.
Solution
.data
string: .asciz "hello world"
.text
la t0, string # t0 = address of the string
addi a0, zero, -1 # a0 = character count, initially -1
count:
lbu t1, (t0) # t1: current character
addi a0, a0, 1 # increase character count
addi t0, t0, 1 # increase pointer
bnez t1, count # if current character is not \0, repeat
end:
Exercise 9
Write a RISC-V program that searches for a given zero-terminated substring in a string (e.g., "loss"
in "blossom"
) and returns 1 if it is present, 0 if it isn’t. Define the strings in the data section and place the result in register a0
. First, write a solution assuming that the characters of the string are 32-bit words (use .word
instead of .string
). What changes if the characters are bytes (using .string
)?
Solution
A possible solution for comparing strings:
.data
string: .asciz "hello world"
substring: .asciz "helo"
.globl main
.text
main:
la t0, string
la t6, substring
li a0, 0
resetsub:
mv t1, t6 # reset the substring (t1) to its start
loop:
lbu t2, (t0) # t2: next character of the string
lbu t3, (t1) # t3: next character of the substring
bnez t3, nonzerosub
li a0, 1 # we have reached the end of the substring: success!
j end
nonzerosub:
bne t2, t3, noneq # if the two characters are not equal
bne t2, zero, nonendstr # if we have not reached the end of the string
li a0, 1 # we have reached the end of the string: success!
j end
nonendstr:
addi t1, t1, 1 # move both strings forward one character
addi t0, t0, 1
j loop
noneq:
bne t1, t6, resetsub # if we are not on the first character of the substring, reset
beq t2, zero, end # if we are at the end of the string, quit
addi t0, t0, 1 # otherwise, move forward in the string
j loop
end:
If we compare words instead of characters in a string, we need to be careful with two things:
- Using
lw
instead oflbu
to load individual words (instead of characters) - Increasing the array pointers by 4 (one word is 4 bytes long)