Debugging My LC-3 VM The Hard Way™
LC-3 VM Posts
If you're joining this series at this post, check out the others here:
Twenty-Forty-Not Great
With my LC-3 VM mostly complete, I started to test it out by using the 2048 application provided in Write Your Own VM1 as our VM target. Unfortunately, I knew right away that there was a bug somewhere because all I got was a single message:
Control the game using WASD keys.
I was a little bit surprised considering I had pretty good test coverage! Good test coverage doesn't mean good test parameters as I came to find out.
Mapping Instruction Addresses
The first thing I wanted to do was take a look at what the program expected to happen. Inspecting the 2048 instructions2, everything seemed pretty understandable, especially the beginning. The initial output at least told me that part of the TRAP
instructions were working. Next was to figure out what was different... To start, I wanted to ensure that instruction was encoded correctly from the assembler. Looking at our .obj file here's the batch of instructions (each instruction is 2 bytes):
Address | 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 0A | 0B | 0C | 0D | 0E | 0F |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0x00000000 | 30 | 00 | 2C | 17 | EA | 18 | E0 | 82 | F0 | 22 | E0 | 28 | 4A | 7D | 02 | 04 |
0x00000010 | B0 | 22 | 22 | 22 | 24 | 22 | 74 | 40 | 48 | 9C | 49 | DD | 48 | A9 | 20 | 0A |
0x00000020 | 02 | 01 | 0F | FB | 49 | D8 | E0 | 63 | F0 | 22 | E0 | 3C | 4A | 6D | 03 | F4 |
0x00000030 | F0 | 25 | 40 | 00 | 00 | 00 | 00 | 01 | 00 | 07 | 00 | 08 | 00 | 0F | 00 | 01 |
The first two bytes tell us our origin address in LC-3 Memory, in this case 0x3000 (LC-3 is big endian). Next I needed to fire up my trusty debugger and inspect memory when we're executing the 2048 application.
gdb --args target/lc3 2048.obj
With GDB started, I could set a breakpoint just before executing our first LC-3 instruction and inspect memory[0x3000]
with p /x memory[0x3000]
. At this point, I could see that we did in fact load the application into LC-3 memory at the correct address.
Next in my thought process was to see what state things are in after the application writes to the console since this was the last output I saw. Checking our assembly file, I expect this to be an LEA
followed by a JSR
. We know that these instructions are 4 instructions after our origin of 0x3000 and that each instruction takes 2 bytes. We can translate this into an address in our object file with this equation:
Substituting in 2 bytes per instruction, ORIGIN as 0x3000, and 2 as the origin offset:
With this equation we can quickly verify that the bytes at the 4th instruction (0x3004) are at 0x0A in our object file. After checking memory
in GDB, I also verified this was correct as well.
Lack Of STR
Power
At this point, I was a little bit stumped but pressed on into the function, PROMPT
3. This function immediately begins by storing some register values on to the stack for safe-keeping. These instructions use STR
to save the contents of R0, R1, and R7 into the stack. The stack pointer is held in R6. After executing these though, I noticed a strange thing. R6 contained the updated stack pointer, decremented by 3. But the stack memory did not actually contain the values saved in the registers! After completing the 3 STR
instructions I expected something like this:
Address | Value |
---|---|
SP + 2 | R0 |
SP + 1 | R1 |
SP + 0 | R7 |
Instead, this memory was still 0x0000 as if it hadn't been touched since start. So for some reason I couldn't push values on to the stack? That's seems weird, but it definitely explains the application hanging shortly after running. This train of thought pointed me to the ultimate culprit.
My next instinct was to confirm that the STR
instructions contained the correct operands. I did not find any issues with the encoded bytes. At this point I knew it had to have been my op_str
function responsible for extracting the fields from the instruction bytes and performing the store. Here's the code before I fixed the bug:
void op_str(uint16_t instruction) {
eRegister source = (instruction >> 9) & 0x3;
eRegister base = (instruction >> 6) & 0x3;
int16_t offset = instruction & 0x3F;
offset = sign_extend_16(offset, 6);
memory_write(registers[base] + offset, registers[source]);
}
I started thinking that it was odd that I could have screwed up this shift and mask... and yeah there it was. The lines to pull out source
and base
had the wrong bitmask. I'm guessing my brain used 0x3 instead of 0x7 because I wanted to mask the lowest 3 bits -> 3... Anyways, relieved that the bug wasn't too nasty, I now had a different problem. How the heck was this not caught by my tests?
The Missing Bits
Reviewing my tests, I could see the issue immediately:
TEST(OpStr, PosOffset) {
uint16_t test_val = 0x55;
eRegister source = Register_R0;
eRegister base = Register_R1;
uint16_t offset = 0x10 & 0x1FF;
uint16_t base_val = 0x100;
uint16_t instruction = ((OP_STR << 12) | (source << 9) | (base << 6) | (offset));
registers[source] = test_val;
registers[base] = base_val;
op_str(instruction);
UNSIGNED_LONGS_EQUAL(test_val, memory_read(base_val + offset));
}
I was using R1
as my base register but R1 would only require 1 bit to be set and not the MSB. By masking with 0x3 and not with 0x7, I was missing the MSB in the field. The funny part is I had considered this scenario in the first place, which was why I didn't always use R0 as my fields in testing. But clearly R1 isn't enough either, to fully exercise my instruction decoding I needed to insert data into the MSB as well.
Afterwards, I felt a bit mixed on my design here. I had considered the exact problem I just solved but turns out things were a bit trickier. Definitely kicking myself for not defining some macros for the register bit masking because these could be used throughout my code. That'll be put on the to-do list. Another question I had was could a tool have caught this?
Fuzzers and Fuzzy AI?
My first idea would have been to use a fuzzer. These are test libraries specially designed to generate patterned or arbitrary input for your tests and iterate quickly through, ensuring tests pass and fail as expected. I think this would have caught the issue because I can conceive of a pattern to provide where the fuzzer generates instructions that contain every combination of base register and destination register. In the future I hope to gain some more experience with tools like this!
My second idea was to see how AI does in generating a test case and see if it could have caught the issue. I wanted to try out Claude for the first time and I'm somewhat impressed with the result. It did not get the edge case the first time around and in fact had decoding bugs because it forgot to mask only the 3 required bits. I had to provide further guidance but it did write a test case that would catch the issue. I'd say this query session with an LLM puts another entry in "good but I still have to know what I'm doing category". A junior engineer would not have caught this using only an LLM. You can find the output artifacts here. The chat is a little bit verbose to embed here, so it's not included.
What's Next?
With the bug fixed, I retried running my VM and unfortunately hit some serious console/display bugs. It appears that my VM is printing garbage to the console but I do eventually get a portion of a beginning 2048 board. Baby steps, but I don't want to do debugging like this again. My next inspiration is to work on a GDB-Python extension to provide panes with info on the LC-3 VM itself. I think this would be a fun way to get more experience with Python tooling, and give me an easier debugging method going forward.
Write Your Own VM by Justin Meiners & Ryan Pendleton↩