As a fun bonus exercise, let's take a look at what you can do when you're done writing correct code: optimization. With this, your code can start moving wickedly fast – seriously.
Change directory to lab7/speed
and review the source in the speed.c
file.
The program has the redraw
function from the interrupts exercise, along with some timer scaffolding to count the ticks during the function's execution. The purpose of redraw
is to draw every pixel in the screen in the same color.
The given version works correctly, but is naive to a fault. Build and run the code as given to get a baseline tick count. It is possible to gain more than a 1000x speedup over the redraw0 function!
Take a stepwise approach, so you can measure the effect of a given modification in isolation:
-
Duplicate the previous function and tweak it. For example, make a copy of
redraw0
and rename itredraw1
. Make a small change to the code inredraw1
that you think will impact performance. -
Make a rough prediction about the expected effect on runtime.
-
Build and run the new version to see whether the observed timing matches your intuition. Where the results surprise you, try to figure out why the effect is different than expected. Poke around in the generated assembly or ask us questions.
Repeat this process, each time advancing from the best version so far and making another small change.
Below are some suggested avenues to explore:
- Hoisting an unnecessarily repeated operation outside the loop is always good idea. Those calls to
gl_get_width
andgl_get_height
on each loop iteration have gotta go! - There are a lot of calls to
gl_draw_pixel
. Each is incurring the overhead of a function call and the (redundant) bounds checking within the call. If you bypassgl
, get the draw buffer fromfb
and write each pixel directly to the framebuffer, it will have a significant effect. - Does changing the order the pixels are accessed make a difference, i.e. instead of looping row by column, what if you loop column by row?
- What about looping over the pixels as a 1-d array instead of nested loop in 2-d?
- Here's something that can be tried without changing the code: edit the Makefile to enable various levels of compiler optimization and
rebuild and re-run. What difference do you observe between
-O0
,-Og
,-O2
and-Ofast
? - Think about where the function spends time. Recall that every instruction
contribute a cost: are there ways to change the function so that it does the
same work with fewer instructions? Take a look at the assembly in
speed.list
to see where the effort is going. It takes just a few instructions to write a pixel but each loop iteration adds the overhead cost to increment, compare, branch. How could you change the loop to issue fewer of these overhead instructions and focus more on doing the meaty work of writing pixels?
How big of an improvement were you able to make overall? Where did you get the biggest bank for your buck?