Assignment 3: String Formatting


Assignment written by Pat Hanrahan and Julie Zelenski

Due: Tuesday, October 22 at 5:00 pm

printf cartoon

Goals

Libraries are critical in software development. They enable you to build on existing abstractions that are well-designed and well-tested. Many of the C standard libraries are large and complex. Because we copy our entire program over a slow line to the bootloader each time, we want a smaller, simpler library of only the essentials. The curated subset in our library contains features selected for their utility and relevance to our particular needs.

A library for outputting formatted text is particularly powerful, since printing program state is a valuable form of debugging. The C function to output formatted text is printf. The standard version of printf has extensive features and a large code size. You will implement a pared-down printf that provides only the core functionality. Your printf is layered on a strings module that you will also write. Completing this assignment will give you two new modules to add to your growing Mango Pi library.

In addition to the benefits you will reap from having this functionality, by implementing these modules you will learn:

  • how to decompose a complex programming problem into smaller and more manageable pieces,
  • how strings and characters are represented and manipulated in C,
  • how to convert C data types (numbers, pointers) to a string representation,
  • how to use the UART peripheral on the Pi to communicate with your laptop

Best practices

This is a difficult assignment, and will require a thoughtful plan and good development habits. Here are our strong recommendations on how to approach:

  1. Mindset. Your skills with memory-wrangling and C pointers will grow leaps and bounds in completing this assignment. When you're done, you'll understand these topics well, but you will work hard for it and you're likely to encounter tough bugs along the way. Bringing a growth mindset is one of your best assets. Rather than let that tricky bug get you down, instead focus on the improved understanding you will gain as you work it out. If you start to feel you are losing your mojo, reach out to us for help and encouragement.
  2. Start early. This gives you time to think things through, to come to office hours, to re-work if your first approach isn't working out as you hoped, to pause and appreciate all that you are learning. In contrast, getting a late start means working non-stop under the unpleasant stress of a looming deadline. Some bugs take a few minutes to resolve, others take hours, and you never know which it will be. When you're dealing with a doozy that has you has stumped, one of the best gifts you can have given yourself is time: for a break, a walk, a nap, a meal,… (and a visit to office hours!).
  3. Follow good development and debugging practice, just like you learned in lab. Now is the time to build up your mad gdb skills. Take to heart our recommended Strategies for Success. This will help you complete the assignment more efficiently and with less strife.
  4. Test as you go. Spending 10 minutes upfront to write a test can save you hours of debugging time later.
  5. Commit often. If you modify your code and break it you can easily go back to a working version.

We have many quarters of experience helping students succeed on this assignment, and we know it is within your ability! But please, please, please follow our recommendations so you complete successfully and also have an enjoyable journey.

Get starter files

Change to your local mycode repo and pull in the assignment starter code:

$ cd ~/cs107e_home/mycode
$ git checkout dev
$ git pull code-mirror assign3-starter

In the assign3 directory, you will find these files:

  • strings.c, printf.c: library modules
  • test_strings_printf.c: test program with your unit tests for strings and printf
  • print_pinout.c: sample application that uses printf to display the Pi's pinout. Do not edit this program, just use it as-is.
  • Makefile: rules to build pinout application (make run) and unit test program (make test). Also a make debug target that runs the unit test program under gdb.
  • README.md: edit this text file to communicate with us about your submission

The make run target builds and runs the sample application print_pinout.bin. You will likely only use this target at the very end as a final test of your completed work. The make test target builds and run the test program test_strings_printf.bin. This test program is where you will add all of your unit tests. You will make heavy use of this target. Use the target make debug to run the test_strings_printf.elf program under gdb in simulation mode.

Use of MY_MODULE_SOURCES in Makefile Each week as you move forward, you have the option of building on your own code from the previous assignments. Open Makefile in assign3 and read the comment which explains MY_MODULE_SOURCES. The default setting will list only the library modules for the current assignment. If you edit MY_MODULE_SOURCES to add the library modules you completed previously (e.g. gpio.c and timer.c), your programs will now build using your code for those modules instead of the reference. Using your previous modules as you move forward will further test your code and give you a jumpstart on earning the full system bonus awarded to a final assignment that uses all of your modules and none of the reference. If you encounter a problem using a previous module, you can remove it from MY_MODULE_SOURCES to instead use the reference version until you have a chance to resolve the underlying issue.

Core functionality

In this section of the writeup, we review the module specifications. We recommend you skim this section to get an overview but don't get mired in the minutiae just yet. Before writing any code, read the section Strategies for success for a road map and advice on how to proceed. When ready to implement a particular function, come back to this section to review the nitty-gritty details.

Strings module

Pretty much every programming language supports a string data type and operations. The C language's string type is very bare bones: a pointer to a contiguous sequence of characters terminated by a null-terminator (zero, or '\0'). You can access individual characters using pointer or array operations and that's about it. To do more useful things with a C-string, such as find its length, make a copy, or compare two strings, you'll need additional functions. Use the command man string to list the functions from the standard C strings module. There are a lot of functions in the standard library!

You will implement your own strings module. Your module will not include such a large set of operations, just a few key essentials chosen for their specific usefulness to us.

Start by reviewing the header file (available as $CS107E/include/strings.h or browse strings.h here). The functions exported by the strings module are:

  • memcpy and strlen (both provided to you pre-written)
  • memset
  • strcmp
  • strlcat
  • strtonum

Although these function interfaces are modeled after similarly-named functions in the standard C library, we have made some simplifications, so please read our header file carefully to ensure you are implementing only and exactly what is expected.

The choice of these particular six functions may appear eclectic, but each was selected for its utility in implementing or testing printf. As you implement each string function, consider how to properly use it and what it will be useful for. In particular, strlcat may seem oddly-structured at first glance, but its specific functionality turns out to be an ideal match for certain tasks within printf.

Printf module

The functions in the printf module construct formatted strings to be written to a terminal or a file.

The three public functions you are to implement in printf.c are:

  • printf
  • snprintf
  • vsnprintf
  • This private helper function is expected to be a part of your implementation:
    • num_to_string

Review the header file (available as $CS107E/include/printf.h or browse printf.h here) for documentation of the public functions. The features required in your version are simplified from the standard C library version, so please read our header file carefully to ensure you are implementing only and exactly what is expected.

The printf module is not really so much about output; the work is almost entirely string manipulation. The fundamental task is to process the input format string and its embedded formatting codes and expand into a fully fleshed-out output string.

Number to string conversion

void num_to_string(unsigned long num, int base, char *outstr);

const char *hex_string(unsigned long val);
const char *decimal_string(long val);

You are to implement the helper function num_to_string. This function converts a number to a string representation in the specific base and writes the characters to outstr as a null-terminated string.

As an example, the call num_to_string(209, 10, outstr) converts the number 209 in base 10 to the string representation "209", or more precisely, writes to outstr an array of four ASCII characters ending with a null-terminator: 2 0 9 \0. For hexadecimal base 16, num_to_string(209, 16, outstr) writes the output string "d1", the array of characters: d 1 \0.

The first argument num is the number to be converted. The base argument indicates whether the output string should be in decimal (base 10) or hexadecimal (base 16). No other bases are supported. The outstr argument is the address of the array where the output string is to be written. Your function can assume the outstr array is always large enough to store all digits of num including space for the null-terminator.

The additional conversion convenience functions hex_string and decimal_string build on your num_to_string function. The convenience functions are provided to you pre-written. You should use this code as-is unchanged. Read the provided comments in the starter code for info on how to properly use these functions.

Snprintf and family

int snprintf(char *buf, size_t bufsize, const char *format, ... );
int vsnprintf(char *buf, size_t bufsize, const char *format, va_list args);
int printf(char *format, ... );

These three functions of the printf family each accept the same type of input strings and formatting codes, but differ slightly in how they are called or where the output is written.

The "formatting codes" allow combining different types of values into a single output string. Review this C reference for sample uses of printf. The default printf writes the formatted output to your terminal; the snprintf variant writes the formatted output to a string buffer. In the final arrangement, the workhorse vnprintf will underly both printf and snprintf.

Full documentation for any standard C function is available in its man page, e.g. man snprintf. Bear in mind that your implementation supports a more limited set of options than the full-featured standard library version. Refer to our printf.h header file to know exactly what your version is required to support.

Any ordinary characters in the input string are copied unchanged to the output string. Where the input string contains formatting codes, these are placeholders for values to be inserted in the output string. For each formatting code, the requested conversion is applied to the associated argument and then written to the output string.

Your implementation must handle these formatting codes for converting arguments:

 %c   single character
 %s   string
 %d   signed decimal integer (%ld long decimal)
 %x   unsigned hexadecimal integer (%lx long hex)
 %p   pointer
 %%   directly output a percent sign (no formatting)

For formatting codes %c and %s, no processing is needed to "convert" characters and strings, the character or string argument is copied as-is to the output string.

For the integer formatting codes %d and %x, the argument is of type int; for %ld and %lx, the argument is type long which is just a wider integer (64-bit). The numeric value is converted to string representation using the decimal_string or hex_string convenience function and copied to the output string.

Formatting codes support an optional field width. The field width enforces a minimum number of characters in the output for this conversion. Space characters are inserted as necessary to pad up to the minimum width. For formatting codes that output in hexadecimal, the padding character is a '0' instead of space. In all cases, padding chars are inserted on the left.

Some examples:

  • "%3c" char argument, output is space-padded width-3
  • "%12s" string argument, output is space-padded width-12
  • "%8x" unsigned int argument, output is zero-padded width-8
  • "%7ld" long argument, output is space-padded width-7

The %p format is a variant of %lx used for pointers. A pointer uses a default conversion format of zero-padded width-8 hexadecimal prefixed with 0x, e.g. 0x02000040. If a field width is specified, e.g. %16p, it overrides the default width of 8.

The snprintfand printf functions take a variable number of arguments, one argument for each formatting code in the format string. To access those additional arguments, you use C's <stdarg.h> interface. Read more about Variadic functions below.

The characters of the converted output are written to buf, truncated if necessary to fit in bufsize. The return value is the number of characters (not including the null-terminator) in the fully expanded output (i.e. number of characters that would have been written).

Some examples:

  • snprintf(buf, 20, "%3s", "hello") writes 6 chars to buf h e l l o \0 and returns 5
    • Explanation: string, no padding needed for field width 3, fits in bufsize 20, no truncation
  • snprintf(buf, 20, "%2c", 'M') writes 3 chars to buf   M \0 and returns 2
    • Explanation: char, inserts 1 space char to pad to field width 2, fits in bufsize 20, no truncation
  • snprintf(buf, 20, "%4x", 27) writes 5 chars to buf 0 0 1 b \0 and returns 4.
    • Explanation: hex, inserts 2 zero chars to pad to field width 4, fits in bufsize 20, no truncation
  • snprintf(buf, 5, "%7d", -9999) writes 5 chars to buf     - 9 \0 and returns 7.
    • Explanation: decimal, inserts 2 space chars to pad to field width 7, does not fit in bufsize 5, truncates to first 4 that fit, returns count that would have been written if not truncated
  • snprintf(buf, 0, "%p", ptr) writes nothing to buf and returns 10.
    • Explanation: pointer, default output form 0xXXXXXXXX, no chars fit in bufsize 0, writing 0xXXXXXXXX would have been 10 chars

bufsize and memory corruption: here be dragons! One of the most critical requirements for snprintf is that it must always respect bufsize. bufsize communicates the hard upper limit on how much space is available to store the output string, but there is no guarantee that the entirety of the converted output will fit within bufsize. In all cases bufsize wins: not writing past the end of buf and not corrupting memory is more important than writing out the string requested by the arguments. If bufsize is too small to fit all of the output, even if the minimum field width says you should go past it, you must truncate the output and store a null-terminator in buf[bufsize - 1]. Finally, bufsize can be zero: if so, you should not write anything to buf, not even a null-terminator.

Variadic functions

printf and snprintf are functions that take a variable number of arguments. C provides the stdarg.h mechanism to support variadic functions. Below is an example:

#include <stdarg.h>
#include <stdio.h>

int sum(int n, ...) { // one fixed argument, followed by other variable arguments
    int result = 0;
    va_list ap; // declare va_list

    va_start(ap, n); // init va_list, read arguments following argument named n
    for (int i = 0; i < n; i++) {
       int arg = va_arg(ap, int); // access value of next argument, type is `int`
       result += arg;
    }
    va_end(ap);   // clean up

    return result;
}

int main(void) {
    printf("%d\n", sum(3, 51, 19, 32));
    printf("%d\n", sum(2, 7, -7));
    return 0;
}

The parameter list int sum(int n, ...) has one fixed argument n and an ellipsis which indicates that it can optionally followed by any number of additional arguments. For example, the call sum(3, 51, 19, 32) contains one fixed argument, 3, and three additional arguments: 51, 19, and 32. In a call to sum, the fixed argument is the count of additional arguments to follow.

Unlike fixed arguments, variable arguments do not have names, so you need a mechanism to access their values within the sum function. The stdarg.h header defines the va_list and its operation for this purpose. You declare a va_list, initialize it via va_start, iterate over the variable arguments using va_arg, and clean up with va_end.

To initialize a va_list call va_startpassing the name of the last fixed argument. This configures the va_list to start reading at the first argument that follows that named argument. In the above example, we tell va_start that n is the last argument. Note that we literally pass the name n. The type or value of the fixed argument doesn't matter – the fact that n happens to be an int is irrelevant, nor does it use the value of n, va_start is just using the name to locate where to start reading the arguments that follow.

Within the loop, the value of each argument is accessed using va_arg(ap, type) where type indicates the type of the argument being accessed. In the sum example, the variable arguments are all int type, but the type can be different per-argument by changing the type passed to va_arg. When we are done processing all of the variable arguments, we call va_end(ap) to clean up after ourselves.

One nit to be aware of is that you cannot ask va_arg for an argument of char type. Instead you must ask for the value as an int type. (This is due to obscure rules in C standard about "default argument promotions")

For additional information about stdarg, read the Wikipedia page on stdarg.h or this tutoriall https://www.tutorialspoint.com/c_standard_library/c_macro_va_start.htm

Strategies for success

Having read up to here, you may feel a bit overwhelmed by all that lays before you. It is a big job, but it will be much more tractable if you break it down into manageable tasks and tackle one at a time. Developing an appropriate decomposition and identifying a good path to follow from start to finish is not always obvious when you just starting out, so read on for our guidance on strategies that we have found to work well.

Order of attack and strategic hints

1. strings module

Definitely start here. Each string function is a small task that can be implemented and tested independently. As you write each function, brainstorm test cases that cover a range of uses. Add those tests into test_strings_printf.c. It will be useful to test both under gdb and on the Pi. Use make test to build and run the test program on the Pi and use make debug to run under gdb. If it works on one and not the other, this typically means there is a latent bug, one whose effect differs by context. (review differences due to simulation) Don't move on to the next function until the current one is running correctly in all situations.

  • Confirm you have a clear understanding of how a C-string is represented: char* is memory address, at address is array of ASCII characters ending with a null-terminator (zero character). Many string-handling bugs arise from forgetting or mishandling the null-terminator. Be sure you know why it's necessary and what can happen when it is missing or misplaced.
  • strcmp is the simplest of the bunch and makes a great starting point for practice with string handling. The strcmp function is critical to all your future testing as you will later depend on it to confirm the result of other string operations. What are some test cases you can use now to ensure strcmp will be robust and reliable when you need it?
  • We provide the implementation of memcpy, the generic function that copies a sequence of bytes from source location to destination. The interface is written in terms of void* which allows the function to accept any type of pointer and copy any type of pointee data. Because it is not valid to deference a void* or use in pointer arithmetic, it uses a typed pointer char* to access the data as an array of raw bytes. Review our given code and confirm you understand how it works. You're now ready to write memset, the generic function that writes a repeated byte in a region of memory.
  • Tackling strlcat at this point allows it to layer on the already-implemented strlen and memcpy.
  • Working through strtonum will reinforce your understanding of the difference between the ASCII character '5' and the integer value 5 and what is needed to convert from string form to integer value. You will later implement that same conversion in the opposite direction.
  • The second argument to strtonum is of type char **. This double pointer may give you a double take! In this case, the argument is serving as an output parameter whose value is being modified "by reference". C does not have a language equivalent to the C++ reference parameter, but we can use pointers to create a manual version. The code below demonstrates using an int * parameter as a mechanism to modify an int variable. The char ** parameter of strtonum is similarly being used to modify a variable of type char *.
    void no_change(int val) {
          val += 100;
    }
    
    void change(int *ptr) {
          *ptr += 100;
    }
    
    void main() {
         int num = 4;
         no_change(num);
         change(&num);
    }
    
  • We cannot over-emphasize the importance of testing as you go. Attempting to implement printf on top of an unreliable/untested strings library is an arduous task, as you must debug all of the code simultaneously and untangle complex interactions. In contrast, implementing printf on top of a robust strings library is much more straightforward. Because of your thorough testing, you can be confident that the strings library does its job correctly and focus your attention on debugging only the new code being added.

Having finished the strings module and thoroughly tested it a great first milestone! You now have a collection of very useful string functions and have important lessons under your belt, such as understanding the relationship between pointers and arrays, being aware of need to take care with the null-terminator, and so on.

2. Converting numbers to string

The num_to_string function is a small but mighty operation that convert a number to a string representation (the inverse of the conversion performed by strtonum).

  • Start with conversion to base 10.
  • Translating a number to a digit character decomposes into a nice helper of its own.
  • Pro-tip: if you process the digits in reverse order (e.g. from the least significant digit to the most), your logic will be cleaner. Declare a temporary buffer to store the digits as you go and later invert the order when copying to the outstr.
  • Don't forget to null-terminate!
  • When ready to add support for base 16, don't copy/paste from base 10 to create a second nearly identical conversion! Repeated code means more code to write, debug, test, and maintain; a lose-lose all around. Instead, identify how to unify into a single implementation that flexibly allows for either base.
  • TESTING!
    • This helper does the heavy lifting of the number formatting for printf, so be sure to test very thoroughly.
    • Be sure that your strcmp is rock-solid. Most of the unit tests for conversion/snprintf test a function by writing its output string into a buffer, followed by an assert that uses strcmp to confirm the contents written to the buffer match the expected. In order for these tests to be valid, you must have a fully working strcmp!
    • Your initial test cases will be direct calls to num_to_string. We also recommend tests on the provided hex_string and decimal_string functions as well. These functions depend on your num_to_string, and if it is working properly, it is expected they will be fully functional as well, but rather than assume, use tests to be sure!

3. Implement snprintf

You are now ready to tackle snprintf. The most important advice is do not try to implement the whole gamut of snprintf functionality in one go! This would be an overwhelming task and will quickly lead to a mess of complicated code that is hard to debug or get right. The way to tame the complexity is to advance in small steps, continually testing as you go.

  • Start by implementing a version of snprintf with no support for formatting codes that outputs only ordinary characters.
    • Test on simple example: snprintf(buf, bufsize, "Hello, World!").
    • Test cases use strcmp to confirm that buf contains the expected output string.
  • Add support for the simplest formatting codes first: %% %c %s.
    • Test on a single codes first snprintf(buf, bufsize, "%c", 'M'), work up to multiple mixed codes snprintf(buf, bufsize, "LS%cU" = 100%% %s!", 'J', "fresh").
  • It's best to begin thinking about truncation now and the imperative to respect bufsize.
    • When you are appending characters to the output, you must only copy the characters that fit. Think back to the strings module and the strlcat function that you implemented. That size-bounded concatenation operation is just perfect here! Using it each time you need to append to output gives a tidy and reliable way to enforce truncation and ensure proper null-termination.
    • Add tests that produce more output than fits into a small-size buffer and confirm proper truncation at bufsize.
  • The number formatting codes %d, %x, %ld, %lx are fairly straightforward due to the fully-tested code you already wrote converts a number into a hex or decimal string. Enjoy that victory lap!
  • The last big hurdle is to add handling of the field width.
    • A good first task is to add code that inserts pad chars on left up to minimum field width when appending a conversion. Hexadecimal outputs use '0' as the pad char, all others use ' ' (space).
    • Temporarily hard-code so that every conversion uses a fixed field width of 12, say, and test to confirm that appropriate pad chars are inserted. Also include tests to confirm that padding plays nicely with truncation and properly respects bufsize.
    • Next up: obtain field width from formatting code and apply to the individual conversion. To parse a number out of the format string, recall that handy function you wrote in strings module to convert a string of digits to a number (hint!).
    • The field width applies to formatting codes %c %s %d %ld %x %lx %p, padding with '0' for hexademical and ' ' for all others, all while respecting bufsize. This sounds it would create a lot of combinations to test, but if you work to gather the common code into a unified path, it will significantly cut down on the testing burden.
  • The final formatting code pointer %p is a just a remix of things you already have conquered: unsigned long value in hexadecimal zero-padded to width-8 and prefixed with "0x".

Achieving a working snprintf is the big hill to get over in this assignment. Once you have that, all that remains is re-factoring and layering. You are in the homestretch!

4. Refactor into vsnprintf

int vsnprintf(char *buf, size_t bufsize, const char *format, va_list args);

The printf function needs the same functionality as snprintf. However since snprintf takes a variable number of arguments, you cannot call it directly from printf. You must create a shared helper function vsnprintf (that takes a va_list parameter), which you can then call from both snprintf and printf. Refactoring means moving most of your snprintf code into vsnprintf and then changing snprintf to call vsnprintf. Once you have completed this refactor, confirm you are still passing all of your previous tests.

5. Implement printf

Adding printf is a piece of cake. It declares a stack array of a large length (our versios uses 1024 as the max output length), calls vsnprintf to fill that array with the formatted output string, and hands the string over to uart_putstring. Having thoroughly tested snprintf/vsnprintf, you will not likely need many new tests for printf, since it is built on the same substrate that you have already confirmed correct.

printf uses the uart peripheral which must be initialized with a call to uart_init. This init call should be done once and only once. The starter code already calls uart_init (at start of main) and you do not need to change it.

It is time 🕰 for a serious Happy Dance 🙌 and an epic 🎼 that celebrates your amazing feats! You did it! 🏆 We hope you will enjoy the fruit 🍎 of your labors 💪 for a long time to come!

Testing advice

Students who struggled in the past generally wrote too much code before testing it. Instead you want to approach the work by dividing in tiny increments, making a small, testable improvement each time. If the latest changes don't work, you'll know exactly where to look for the mistake. This strategy will save you a lot of time and heartache. To quote Dawson Engler, Systems Programmer Extraordinaire:

Engler’s theorem of epsilon-steps Given a working system W and a change c, as c → ε the time T it takes to figure out why W + c doesn’t work goes to 0 (T → 0).

After taking each epsilon-step in your code, immediately turn your attention to testing and debugging it. What test case can you add to test_strings_printf.c to confirm that the code you just added is working correctly? It may require multiple test cases to get at different parts of the behavior. Add those tests now and don't move on until you pass them all.

Never delete a test! Sometimes a later change will cause a test that was previously passing to backslide. If you have removed or commented out the test, you won't realize the bug has resurfaced. Instead, accumulate all tests in test_strings_printf.c and keep them active. Every subsequent run will re-validate against the entire test suite and alert you to any regression.

Review the function specifications in the strings.h and printf.h header files and be sure that your test cases have full coverage of the specified behavior, including any edge cases or required error handling. If your own testing gets there ahead of the autograder, you can find and fix your bugs before submitting to its rigorous scrutiny.

In test_strings_printf.c, we want to see a comprehensive test suite that exercises each function in the strings module and all formatting options mix-and-match for printf and variants. There is a lot of ground to cover! Grading will include an evaluation of the effectiveness of your tests, along with our feedback to help you to develop and refine this critical skill.

Our specifications make some simplifying assumptions relative to the standard library. Your functions need only handle calls that are valid according to our assumptions: e.g. exactly and only these formatting codes, base is always be 10 or 16, the width specified must begin with a zero, the format string is well-formed and so on. You do not have to detect/handle/reject calls that violate these assumptions. We will not test on such inputs and your tests do not need to consider these cases.

Debugging advice

One unfortunate circularity with trying to test printf is the lack of a working printf to help you debug. Here are a couple of strategies you may want to consider:

  • Use the debugger! Run under gdb in simulation mode and use gdb commands to step and print variables to observe your program's operation. We strongly encourage you to invest in building up your gdb chops now – this investment will really pay off! Stay mindful of the differences between the simulator and the actual Pi. (Review gdb exercises of Lab 3 for a refresher). The make debug target from the Makefile runs gdb on the test_strings_printf.elf.
     $ make debug
     riscv64-unknown-elf-gdb -q --command=CS107E/other/gdbsim.commands test_strings_printf.elf
     Reading symbols from test_strings_printf.elf...
     Auto-loading commands from CS107E/other/gdbsim.commands...
    
  • Liberal use of assert() tests. For example, you can test the output written by snprintf matches the expected output by asserting the two strings strcmp as equal. Note that the version of assert from assign3 forward calls uart_putstring to print out details (i.e. line number, failed expression), so you are no longer limited to interpreting smoke signals from the blue LED.

  • Compare to reference. The string and printf functions are a part of the C standard library, available in any C compiler (non-bare-metal). If you are not sure of the expected behavior for a particular call, try it on your local compiler, or do a quick test in this handy online C environment (Rextester).

Extension: disassembler

Congratulations on your printf success! The blood, sweat, and tears that you put into it will pay huge dividends, super-charging all your future debugging and providing a foundation on which you can build many cool tools. If you have additional bandwdith to keep going from here, there is a super-neat extension that we hope you will explore!

The extension is to add a new custom formatting code to your shiny new printf that converts a binary-encoded instruction into human-readable assembly. This effectively adds a disassemble operation to printf - wow! Here is a diagram of the bitwise breakdown for the four base instruction encodings taken from the RISC-V ISA Manual:

instruction encodings

Let's break down the R-type, used for three-register ALU instructions. Each register is encoded as its 5-bit numeric index (0 to 31). An R-type instruction encodes 3 registers: one for the destination and two source registers. The opcode bits identify the class of instruction and the funct7 and funct3 bits identify the specific ALU operation variant (add, sub, shift, etc).

To build blink or larons, we called upon the assembler to translate an assembly instruction such as add a3,a4,a5 into its binary-encoded machine instruction 00f706b3. The reverse process is a disassembler which picks apart the bits of the encoded instruction 00f706b3 to produce the output add a3,a4,a5.

Use the custom formatting code %pI which expects a corresponding argument of "pointer to instruction". There is no "instruction" type in C; use a pointee type of unsigned int to read a binary-encoded instruction from memory. The instructions for the currently executing program are stored in memory as well. If you print the instruction at the program start address and move upwards in memory, you can disassemble the entire program!

Here is a sample use of %pI:

unsigned int add = 0x00f706b3;              // manual binary-encoded instruction
unsigned int *first = (unsigned int*)main;  // address of instruction in memory for function main()

printf("Encoded instruction %08x disassembles to %pI\n", add, &add);
printf("Encoded instruction %08x disassembles to %pI\n", *first, first);

The output of the above code is:

Encoded instruction 00f706b3 disassembles to add a3,a4,a5
Encoded instruction fe010113 disassembles to addi sp,sp,-16

You could use your bit-masking superpowers to pick apart an encoded instruction but a simpler way is to define a C bitfield. In the starter file printf.c we included some code that demonstrates sample use of a bitfield to get you started.

To learn more about the instruction encoding, refer to the RISC-V ISA Manual. This neat online encoder/decoder is fun way to learn more!

Start by decoding the common R-type and I-type instructions. The RISC-V ISA encoding is remarkably regular, so you can catch a good chunk of all instructions with just a few cases. Decoding S and B-type instructions will teach you much about how relative addressing works. Don't worry about making special cases for oddballs. For any instructions you don't decode, simply print the encoded value. As much as possible, try to match the output given by the standard disassembly tools (i.e. riscv64-unknown-elf-objdump -d or gdb disassemble command).

There is a unit test in the test_strings_printf.c that demonstrates sample disassemble use. To see how good a job your disassembler is doing, compare your output to the result from gdb's tools. In gdb, you can disassemble the single instruction at an address with the x/i command or dump a sequence of instructions using the disassemble command:

    unsigned int add = 0x00f706b3;
(gdb) x/i &add
   0x4fffffac: add a3,a4,a5
(gdb) disassemble main
   0x0000000040000890 <+0>:  add sp,sp,-16
   0x0000000040000894 <+4>:  sd  ra,8(sp)
   0x0000000040000898 <+8>:  sd  s0,0(sp)
   0x000000004000089c <+12>: add s0,sp,16
   0x00000000400008a0 <+16>: jal 0x400026c4 <uart_init>

To submit the extension for grading, tag your completed code assign3-extension. In your README.md, tell us about your disassembler and which instruction types it can handle. Depending on how you far you take it, we are open to awarding additional bonus credit on this extension; a great implementation deserves to be generously rewarded!🎖

You just wrote a program that dissects itself from the inside – what is a crazy-awesome-meta achievement! 💪

Submitting

The deliverables for assign3-submit are:

  • implementations of the strings.c and printf.c library modules
  • comprehensive tests for both modules in test_strings_printf.c
  • README.md (possibly empty)

Submit your finished code by commit, tag assign3-submit, push to remote, and ensure you have an open pull request. The steps to follow are given in the git workflow guide.

As time permits, you are encouraged to revisit code from previous assignments and submit bug fixes for any issues that are eligible for revision. To submit bug fixes, commit your changes, tag assign2-retest (assignN where N is the assignment you are resubmitting) and push. This tag signals that we should re-run the automated tests and update your issue lists to show new successes. Unit tests on library modules are eligible for resubmit, manual tests and extensions are not eligible.

Grading

To grade this assignment, we will:

  • Verify that your project builds correctly, with no warnings
  • Run automated tests on your strings and printf modules
  • Go over the unit tests you added to test_strings_printf.c and evaluate them for thoughtfulness and completeness in coverage.
  • Review your code and provide feedback on your design and style choices.

Our highest priority tests will focus on the essential functionality of your library modules:

  • strings
    • correct behavior according to spec
    • all strings properly null-terminated, buffer size respected
  • printf
    • correct behavior according to spec
    • format codes %c, %s, %d, %x, %p
    • all strings properly null-terminated, buffer size respected

The additional tests of lower priority will examine less critical features, edge cases, and robustness. Make sure you thoroughly tested on a wide variety of scenarios!