Assignment 3: Implement a String Formatting Library


Written by Pat Hanrahan; edited by Isabel Bush

Due: Tuesday, February 6, 2018 at 6:00 PM

Goals

For this assignment we will be studying modules that are similar to the ones found in the C standard library. Many of the C standard libraries are surprisingly large. Because space is often at a premium when we program embedded computers, it’s desirable to have simpler versions of these functions that implement only the required features. Having printf is particularly valuable, since printing is the most basic form of debugging.

The core of this assignment involves the strings and printf modules. You will add routines for string-handing and formatted printing to your growing library of Raspberry Pi functions. Not only will these modules be supremely useful to you going forward, you will learn a lot as you implement them, including:

This can be a difficult assignment, so start early and practice good development and debugging techniques, just like you learned in lab. Don’t scrimp on testing, it’s your lifeline to health and happiness. You can do it!

Get started

First make sure your directory structure matches the layout detailed in our guide.

Navigate to your copy of the cs107e.github.io repository and do a git pull to be sure you have the latest files.

Now clone your assign3 repository branch as you have done in previous assignments. Make sure you clone it in the same directory as your copy of the cs107e.github.io repository (the cs107e_home directory if you’re following our directory structure guide).

Assuming you intend to use the code that you wrote last week, also copy your gpio.c and timer.c files from assign2 into the assign3 directory. Read the Makefile in the assign3 starter project for more information on how to reuse modules written for previous assignments.

The starter project contains the modules printf.c and strings.c (mostly-empty to start) and the application programs main.c and test.c which are used to exercise the functions in the modules.

You will edit the strings.c and printf.c files to implement the required functions. You will also edit test.c to add testing code to validate your implementation. The main.c contains a sample program that exercises the modules. Please avoid modifying main.c in this and future assignments so that we can be sure that your assignment solution conforms to the interface we expect.

Basic section

Strings module

Any programming language worth its salt is going to support a string data type and operations. C is no exception, but its string type is quite bare bones. A C-string is simply a pointer to a contiguous sequence of characters terminated by a null char. You can access individual characters using pointer or array operations and that’s about it. In order to do anything more useful with a C-string, such as find its length, make a copy, or compare one to another, you need to add functions for these operations.

Because implementing printf is made much more tractable if you start with reasonable facilities for string manipulation, you are going to first implement the strings module in the file strings.c. The strings.h header file declares a set of string functions that will be useful to you now (in implementing printf) as well as in the future. The functions you are to write are:

Refer to our header file for the documentation on the required behavior of each function. Although our function interfaces are modeled after those in the standard C library, we have made some simplifications, so please carefully read our header file to ensure you are implementing only and exactly what is expected!

The strings exercise at the end of Lab 3 examined the functions strlen and strcpy. You are welcome to cannibalize from that code as you implement these additional string operations.

Read our advice on testing! Trying to implement printf on top of a module that is not reliable and well-tested will make the task so much more difficult.

Printf module

With a set of robust string operations now in your toolkit, you’re ready to go on to tackle the printf module.

The functions in this module produce formatted output. Each function accepts the same type of format strings with the same conversions, but differ slightly in how they are called or where the output is written.

You’re code will be written in printf.c, and the header file printf.h contains the documentation. The functions you will implement are:

We recommend you start by implementing the helper functions that convert integers to strings, then move on to snprintf, vsnprintf, and finish with printf. Be sure to read the following sections before starting your implementation, as we detail some important requirements as well as provide advice on how to design your solution.

1. Integer to string conversion

 int unsigned_to_base(char *buf, int bufsize, unsigned int val, int base, int min_width);
 int signed_to_base(char *buf, int bufsize, int val, int base, int min_width);

These helper functions handle the task of converting a number to its string representation.

The caller passes the arguments buf and bufsize. buf is a character array where the string is to be written and bufsize is the size of that buffer. You can assume that bufsize will always be nonnegative. An important constraint is to respect the size of the buffer. Take care to not write past the end!

The val argument is the integer value to be converted to a string. In the signed function, if the value is negative, you will need to include a leading minus sign in the string.

The base argument controls whether the string is written in decimal (base 10) or hexadecimal (base 16). Do not copy/paste to create two nearly identical conversion functions, one for each base! Instead identify how to unify the code to flexibly allow for either base. Similarly although there is both a signed and unsigned conversion function, you should also not duplicate code between the two. Instead consider how you can call the unsigned function from the signed function.

The min_width is the minimum number of characters the caller wants in the output string. If the formatted string is less than min_width, pad the string with zeros on the left so that it has length min_width. If the formatted string is already at least as long as min_width, ignore the min_width argument. The minus sign should precede the zero padding and is included in the min_width. You can assume min_width is always nonnegative.

You should null-terminate what you write into buf. A valid C string must be null-terminated; there is no other way for the caller to know where the output ends.

bufsize is a hard upper limit on how much space is available to store the output string; if bufsize is too small to fit your output, even if min_width says you should go past it, you must cut your output off and make buf[bufsize - 1] a null terminator. (Otherwise, you would be trashing some other memory past the end of the buffer.) Note that bufsize can be zero, and if so, you should not write anything to the buffer, not even a null terminator.

These functions should return the count of characters written to buf if there is room to fit them all. If not, it should return the count of characters that would have been written if there were room. The null terminator is not included in the count.

For example, if you call signed_to_base(buf, 5, -9999, 10, 6) then it should write 5 bytes into buf: "-099\0" ('\0' being the null terminator) and return 6.

Test as you go! Remember that it is easier to debug small steps than large changes, so make sure these functions are working before you move on. There are many different cases to cover, don’t be shy about adding a lot of tests into test.c.

2. Implement snprintf

int snprintf(char *buf, int bufsize, const char *format, ... );

The printf family of functions use what are called “formatting strings” to allow you to combine together different kinds of values into a single string. Look at the C++ reference for some examples of using printf. By default printf prints to your terminal and snprintf instead “prints” to a string. For more information about printf or other C functions, you can type man snprintf to bring up its man page, but bear in mind that your implementation supports a more limited set of options than the full-featured standard library version.

The five formatting codes that you are to support are:

 %c   single character
 %s   string
 %d   signed decimal integer
 %x   unsigned hexadecimal integer
 %p   pointer

The basic operation of snprintf is to traverse the format string, copy literal characters unchanged to the output string, and process each formatting code by doing the appropriate conversion of the argument to string form. No actual processing is need to “convert” characters and strings, so those are straightforward. To convert integers to string, you have your handy unsigned_to_base and signed_to_base helper functions. And lastly pointers are just a special kind of integer.

The formatting codes for integers (%d and %x) allow an optional width, such as %03d or %05x. The width specifies the minimum number of characters in the converted string. The width is a nonnegative decimal number and must have a leading 0.

For example, "%014x" outputs its unsigned integer argument as a hexadecimal string of at least 14 characters (padded with zeros on the left as necessary). You do not need to support width for the other formatting codes.

The %p format outputs an address as a width-8 hexadecimal string prefixed with 0x, e.g. 0x20200004. Rather than write an entirely new function to convert an address, take advantage of the fact that an address is merely an unsigned int and you already have a helper to convert unsigned to string. (hint!)

The snprintf function takes a variable number of arguments, one argument for each formatting code in the format string. To access those additional arguments, you use C’s <stdarg.h.> interface. Read more about Variadic functions further down in this writeup.

The return value from snprintf is the number of characters written if they fit in the buffer or the number of characters that would have been written if there were room. Again, bufsize is the buffer size that you’re allowed to write to, and you should null-terminate your output.

For simplicity, you can assume that you will never output a string of length greater than 1024 characters (regardless of the size of the destination buffer). If you allocate a temporary buffer of the maximum size and write the full string into it, you can then copy what you need into the final destination as a way to neatly handle the truncation case.

Incremental steps are the key to success. First, implement a version of snprintf that can handle string constants and test it with simple examples such as snprintf(buf, BUF_SIZE, "Hello, World!"). Then add the handling for the formatting codes one by one, starting with single characters, then strings, then on to integers and finally pointers. Check out the function test_snprintf in test.c.

Congratulations! Once you have a working snprintf, you are totally in the homestretch.

3. Refactor into vsnprintf

int vsnprintf(char *buf, int bufsize, const char *format, va_list args);

The printf function needs much of the same code you already wrote for snprintf. However since snprintf takes a variable number of arguments, you cannot call it directly from printf. Thus you will need to create a shared helper function vsnprintf (that takes a va_list parameter), which you can then call from both snprintf and printf. Refactoring means moving most of your snprintf code into vsnprintf and then changing snprintf to call vsnprintf. Once you have completed this refactor, make sure you are still passing all of your snprintf tests.

4. Implement printf

Finishing off printf is now a piece of cake. All the function needs to do is declare a stack array of the maximum output length (1024), call vsnprintf to write the formatted string into the array, and call uart_putchar to output each character.

int printf(char *format, ... );

By now, you should be familiar with testing your code step by step. Hopefully you have been writing mini-tests all along to verify your functions.

This given tests in main.c and test.c only tests basic printf functionality, so make sure you should add your own tests as well. You should include at least one test for each type of printf argument (%c, %s, %p, %d, %x). For example, such tests might look like

printf("%d = -5\n", -5);
printf("%p = 0x20200008", FSEL2);
printf("%04x = 00ab\n", 0xab);

Also make sure you add tests for unusual cases. What happens if you try printf("")? What about printf("\n")?

Extension: disassembler

The assignment extension is to put your shiny new printf to work in writing a program that can disassemble itself. The binary-encoded instructions for the currently executing program are stored in memory starting at address 0x8000. Each instruction is encoded as an unsigned int. Here is a diagram of the bitwise breakdown for a data processing instruction:

data processing instruction

Reading from left to right, the upper four cond bits indicate conditional execution, the next three bits are 000 indicate this is a data processing instruction with an immediate operand 2, the four opcode bits determine which kind of operation (add, subtract, etc), the S bit determines whether the flags are set, and so on.

In class, Pat had you play the role of assembler by translating an instruction such as add r3, r4, r5 into e0843005. The reverse process is a disassembler which picks apart the bits of the encoded instruction e0843005 to print out add r3, r4, r5. The extension is to automate this disassembly process and produce output like that shown below:

...
0x00008074: e0 43 30 05    add r3, r4, r5
0x00008078: eb 00 01 c9    bl 87a4
...

You could use your bit-masking superpowers to pick apart an encoded instruction but a simpler way is to define a C bitfield. Open the file disassemble.c given in the starter code to see an example.

Your extension should be capable of decoding the most common variants of the data processing and branch instructions. The ARM instruction set has a remarkably regular encoding, so you can catch a good chunk of all instructions with just a few cases. If you want to get fancier, try decoding load/store and load/store multiple (i.e. push and pop).

Refer to the ARM ISA documentation for details on the instruction encodings. Don’t worry about making special cases for oddballs. For any instructions you don’t decode, print the encoded value and skip to the next.

Print the sequence of disassembled instructions starting from address 0x8000 and continuing for 100 instructions or so. Some of the data you encounter may not be instructions at all (e.g. data values can be intermixed into the instructions). Don’t worry about those, just decode the first 100 4-byte values as though each was an instruction.

Compare your output to the contents of disassemble.bin.list (this is the disassembly produced by objdump) to see how good a job your disassembler is doing. You just wrote program that dissects itself from the inside – how awesome is that!

We will offer more credit on this extension as it is a bit harder than the last extension, and a really great implementation will be appropriately awarded!

Variadic functions

printf and snprintf are functions that take a variable number of arguments. C has standardized the way to access variadic functions using the stdarg.h interface. Here is an example.

#include <stdarg.h>
#include <stdio.h>

int sum(int n, ...)
{
    int result = 0;
    va_list ap;

    va_start(ap, n);
    for (int i= 0; i < n; i++)
        result += va_arg(ap, int);
    va_end(ap);

    return result;
}

int main(void)
{
    printf("%d\n", sum(4, 100, 0, 5, 2) );
    printf("%d\n", sum(2, -1, 1));
    return 0;
}

The function sum has one fixed argument n, followed by some number of additional arguments, indicated by the ... in its parameter list. The value of n communicates how many of those arguments there are. For example, the call sum(4, 100, 0, 5, 2) passes 4 as the first argument to indicate that there are 4 subsequent arguments.

In general, variable-argument functions still require at least one fixed argument. It should be possible to determine the number of arguments from the first argument; n in this case above, or the format string in the case of printf.

The implementation of the sum function demonstrates how to use stdarg.h.

The variable ap is declared of type va_list. This variable is initialized using va_start, which is passed the last named argument. In this case, we tell va_start that n is the last argument before the variable arguments begin.

Then we loop over the n arguments, fetching each argument using va_arg(ap, type). In the sum example, the variable arguments are all of int type, but the type can be different per-argument by changing what type is used in va_arg. When we are done processing all of the variable arguments, we call va_end(ap) to clean up after ourselves.

Note that because of obscure rules in the C standard about “default argument promotions”, you cannot ask va_arg for the char type. Instead you must ask it for an int and then cast the int to a char.

For additional information about stdarg, read the Wikipedia page on stdarg.h.

Testing, testing testing!

Students who struggled in the past generally wrote too much code before testing it. Try to implement this assignment in little increments, making a small, testable improvement each time. If it doesn’t work, you’ll at least know exactly where to find the mistake.

We strongly recommend that you follow a “test as you go” strategy! As you write each function, immediately turn your attention to testing it.

We provide a few tests to get you started in test.c. Passing the given tests is a good first step, but these simple tests are far from sufficient. Part of your job is to review test.c to understand what is provided and brainstorm what additional tests are needed to thoroughly vet your functions. Edit test.c to add those test cases and verify that your implementation passes all those tests before moving on to the next function.

Never delete a test! Sometimes a test that you passed earlier will regress due to later changes. If you have removed the test case, you may not realize the bug has resurfaced. Just keep accumulating tests in test.c as you think of new cases to try and then each run of your test can validate you are passing your entire suite.

One unfortunate circularity with trying to test a printf implementation is the lack of a working printf to help you debug. Here are a couple of strategies that don’t involve printf that you may want to consider:

Alternatively, the libpi.a that we have provided you comes with an implementation of printf! You can use this printf to debug your implementation by changing the name of all the functions in printf.c (e.g. adding a my_ prefix, such as my_printf). NOTE: you should also update your testing code to call your renamed functions (e.g. my_printf), and use the normal printf in places where you’d like to debug. For example, in test.c you might call my_printf to test your code, and within your my_printf implementation, you might call printf to read out internal values for debugging. WARNING: Our implementation of printf will use your implementation of the functions in strings.c. You should be certain that your implementations are well tested for this path to work.

Other advice

Submit and automated checks

Submit the finished version of your assignment by making a git “pull request”.

The automated checks here, as always, make sure that we can run your C code and test and grade it properly, including swapping your tests for ours.

CI will automatically check that:

Again, if CI fails on your final submission, we will automatically deduct 1 point from your basic grade.