I recently finished building Ben Eater’s 8-bit Breadboard
Computer but near the end of
it I realized that much of the fun was in the design and the learning, but the
act of actually cutting wire was an exercise in tedium.
I wasn’t quite ready to put the 8-bit computer behind me, but I didn’t want to
cut or bend wire anymore.
I then emulated the 8bit CPU on the original Game
Boy, building it from
scratch in assembly. That was a fun experience as I got to learn more about the
Game Boy hardware as well as writing assembly code.
I’ve always wanted to mess around with an FPGA but two things have stood in my way:
Expensive: Many FPGA dev boards are $100+ and come with features I don’t need.
Bad Tools: The tools are often complex, proprietary, and have no support for Linux.
I then discovered the TinyFPGA
BX which checks all of the
boxes:
$38
Open-source Toolchain with Linux support
Small and simple
Can be placed onto a breadboard
I managed to buy one from a reseller, soldered some pins on it, and put it on a
breadboard.
FPGA SAP-1
The SAP-1 is a good introductory FPGA project because it’s pretty simple to
implement and it can fit on a small FPGA.
All that’s needed is the TinyFPGA BX plugged into a breadboard with wires
connected to a 7-segment display for showing the output. All of the counters,
registers, memory, and LED outputs that are part of what make the breadboard
versions so interesting to look at are completely hidden. The only thing
visible to the outside world is what’s shown on the 7-segment display.
FPGA vs Software
Learning how to write code (if you can call it that) for an FPGA is a very
different experience from writing code for a CPU. In this case, the Verilog
code that is written for the FPGA is describing logic to create a CPU while
normal programming is actually running on a CPU. There are a lot of gotchas
that can confuse and mislead you if you go into it thinking it’s just another
programming language.
The biggest mental block is serial vs parallel. When writing software
code (usually) executes from top to bottom in a linear fashion, one
statement after the next (ignoring loops and such). With an FPGA, everything is
executing concurrently because you’re describing circuits full of multiple
wires.
It took a bit of time to shift my mind towards this paradigm because I’m so
used to writing code that runs on a CPU. I’m used to ordering statements so
that if B depends on A, A is defined before B. But with Verilog the order of
statements doesn’t matter because everything is evaluated as a whole.
I’m definitely not an expert at this point but I feel a little more comfortable
with the HDL-way of doing things. I’d like feedback from anyone who knows this
stuff well.
Describing the CPU
I was surprised to see that I was able to describe the SAP-1 in relatively few lines of Verilog. Something that took
hours to connect on the breadboard or fifteen minutes to write Game Boy assembly for was just a line or two.
With Verilog you’re describing the logical behavior of a circuit and relying on the FPGA tools to synthesize your design
into the gates on the FPGA that will do what you want. Similar to how you describe a program in C and trust the compiler
to generate assembly for you.
Bus
The bus is a simple continuous assignment statement where its value instantly changes based on whichever control
signal (if any) is currently asserted. Thus it uses an assign statement (combinational) rather than an always
block (synchronous).
The program counter is updated synchronously on the rising edge of the clock or asynchronously with the
reset signal (tied to a button). If it’s reset, the counter goes to zero. If the CE control signal goes high, the
counter increments, and if the JP control signal goes high then we set the PC to the value on the bus.
The Instruction Step Counter is very similar to the Program Counter but with some extra logic. While the PC can
increment until it overflows and goes back to zero, the instruction step counter needs an explicit check for when it
reaches the maximum stage of 5.
There is also the matter of halting. On the breadboard the HLT signal disables the clock output but that isn’t possible
here, so instead I have it put the instruction stage counter into an infinite loop at Stage 6 from which it can never
exit.
1
2
3
4
5
6
7
8
9
10
11
12
reg[2:0]stage;always@(posedgeclkorposedgereset)beginif(reset)stage<=0;elseif(stage==5||ctrl_jp)stage<=0;elseif(ctrl_ht||stage==6)// For a halt, put it into a stage it can never get out of
stage<=6;elsestage<=stage+1;end
Instruction Register
The Instruction Register gets the value on the bus if the II control signal is asserted, or is reset.
Memory is described easily enough as an array of registers which is fine for this project but
a dedicated RAM chip would probably be a better idea for larger projects.
I found hooking up all of the memory on the breadboard to be one of the bigger pain points so
being able to do it in a few simple statements was both a relief and a disappointment. Just slightly
too easy.
The ALU has the most complicated behavior because that’s where most of the fun happens involving addition,
subtraction, and flags. The B register needs its own special logic to account for the SU control signal doing twos-complement.
reg[7:0]a_reg;reg[7:0]b_reg;wire[7:0]b_reg_out;wire[8:0]alu;wireflag_z,flag_c;always@(posedgeclkorposedgereset)beginif(reset)a_reg<=0;elseif(ctrl_ai)a_reg<=bus;endalways@(posedgeclkorposedgereset)beginif(reset)b_reg<=0;elseif(ctrl_bi)b_reg<=bus;end// Zero flag is set if ALU is zero
assignflag_z=(alu[7:0]==0)?1:0;// Use twos-complement for subtraction
assignb_reg_out=ctrl_su?~b_reg+1:b_reg;// Carry flag is set if there's an overflow into bit 8 of the ALU
assignflag_c=alu[8];assignalu=a_reg+b_reg_out;
Flags Register
The Flags register stores the C and Z flags from the ALU if the FI control signal is asserted, or is reset.
Finally, the control signals aren’t complicated, they just take up a lot of code because there are sixteen of them
and they depend on various combinations of stage and opcode. They change on the falling edge of the clock to make sure
they’re set before any modules are active on the rising edge.
parameterOP_NOP=4'b0000;parameterOP_LDA=4'b0001;parameterOP_ADD=4'b0010;parameterOP_SUB=4'b0011;parameterOP_STA=4'b0100;parameterOP_LDI=4'b0101;parameterOP_JMP=4'b0110;parameterOP_JC=4'b0111;parameterOP_JZ=4'b1000;parameterOP_OUT=4'b1110;parameterOP_HLT=4'b1111;// Halt
regctrl_ht;always@(negedgeclk)beginif(ir[7:4]==OP_HLT&&stage==2)ctrl_ht<=1;elsectrl_ht<=0;end// Memory Address Register In
regctrl_mi;always@(negedgeclk)beginif(stage==0)ctrl_mi<=1;elseif(ir[7:4]==OP_LDA&&stage==2)ctrl_mi<=1;elseif(ir[7:4]==OP_ADD&&stage==2)ctrl_mi<=1;elseif(ir[7:4]==OP_SUB&&stage==2)ctrl_mi<=1;elseif(ir[7:4]==OP_STA&&stage==2)ctrl_mi<=1;elsectrl_mi<=0;end// RAM In
regctrl_ri;always@(negedgeclk)beginif(ir[7:4]==OP_STA&&stage==3)ctrl_ri<=1;elsectrl_ri<=0;end// RAM Out
regctrl_ro;always@(negedgeclk)beginif(stage==1)ctrl_ro<=1;elseif(ir[7:4]==OP_LDA&&stage==3)ctrl_ro<=1;elseif(ir[7:4]==OP_ADD&&stage==3)ctrl_ro<=1;elseif(ir[7:4]==OP_SUB&&stage==3)ctrl_ro<=1;elsectrl_ro<=0;end// Instruction Register Out
regctrl_io;always@(negedgeclk)beginif(ir[7:4]==OP_LDA&&stage==2)ctrl_io<=1;elseif(ir[7:4]==OP_LDI&&stage==2)ctrl_io<=1;elseif(ir[7:4]==OP_ADD&&stage==2)ctrl_io<=1;elseif(ir[7:4]==OP_SUB&&stage==2)ctrl_io<=1;elseif(ir[7:4]==OP_STA&&stage==2)ctrl_io<=1;elseif(ir[7:4]==OP_JMP&&stage==2)ctrl_io<=1;elseif(ir[7:4]==OP_JC&&stage==2)ctrl_io<=1;elseif(ir[7:4]==OP_JZ&&stage==2)ctrl_io<=1;elsectrl_io<=0;end// Instruction Register In
regctrl_ii;always@(negedgeclk)beginif(stage==1)ctrl_ii<=1;elsectrl_ii<=0;end// A Register In
regctrl_ai;always@(negedgeclk)beginif(ir[7:4]==OP_LDI&&stage==2)ctrl_ai<=1;elseif(ir[7:4]==OP_LDA&&stage==3)ctrl_ai<=1;elseif(ir[7:4]==OP_ADD&&stage==4)ctrl_ai<=1;elseif(ir[7:4]==OP_SUB&&stage==4)ctrl_ai<=1;elsectrl_ai<=0;end// A Register Out
regctrl_ao;always@(negedgeclk)beginif(ir[7:4]==OP_STA&&stage==3)ctrl_ao<=1;elseif(ir[7:4]==OP_OUT&&stage==2)ctrl_ao<=1;elsectrl_ao<=0;end// Sum Out
regctrl_eo;always@(negedgeclk)beginif(ir[7:4]==OP_ADD&&stage==4)ctrl_eo<=1;elseif(ir[7:4]==OP_SUB&&stage==4)ctrl_eo<=1;elsectrl_eo<=0;end// Subtract
regctrl_su;always@(negedgeclk)beginif(ir[7:4]==OP_SUB&&stage==4)ctrl_su<=1;elsectrl_su<=0;end// B Register In
regctrl_bi;always@(negedgeclk)beginif(ir[7:4]==OP_ADD&&stage==3)ctrl_bi<=1;elseif(ir[7:4]==OP_SUB&&stage==3)ctrl_bi<=1;elsectrl_bi<=0;end// Output Register In
regctrl_oi;always@(negedgeclk)beginif(ir[7:4]==OP_OUT&&stage==2)ctrl_oi<=1;elsectrl_oi<=0;end// Counter Enable
regctrl_ce;always@(negedgeclk)beginif(stage==1)ctrl_ce<=1;elsectrl_ce<=0;end// Counter Out
regctrl_co;always@(negedgeclk)begin// Always in Stage 0
if(stage==0)ctrl_co<=1;elsectrl_co<=0;end// Jump
regctrl_jp;always@(negedgeclk)beginif(ir[7:4]==OP_JMP&&stage==2)ctrl_jp<=1;elseif(ir[7:4]==OP_JC&&stage==2&&flags[FLAG_C]==1)ctrl_jp<=1;elseif(ir[7:4]==OP_JZ&&stage==2&&flags[FLAG_Z]==1)ctrl_jp<=1;elsectrl_jp<=0;end// Flags Register In
regctrl_fi;always@(negedgeclk)beginif(ir[7:4]==OP_ADD&&stage==4)ctrl_fi<=1;elseif(ir[7:4]==OP_SUB&&stage==4)ctrl_fi<=1;elsectrl_fi<=0;end
The Program
A program can be loaded into memory using an initial block which sets the values once upon start up.
The following is the program that runs in the video titled “Conditional jump
instructions” which
counts from 0 to 255 and back to 0 again repeatedly.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
initialbeginmem[0]={OP_OUT,4'b0};mem[1]={OP_ADD,4'hF};mem[2]={OP_JC,4'h4};mem[3]={OP_JMP,4'h0};mem[4]={OP_SUB,4'hF};mem[5]={OP_OUT,4'h0};mem[6]={OP_JZ,4'h0};mem[7]={OP_JMP,4'h4};mem[8]={OP_NOP,4'h0};mem[9]={OP_NOP,4'h0};mem[10]={OP_NOP,4'h0};mem[11]={OP_NOP,4'h0};mem[12]={OP_NOP,4'h0};mem[13]={OP_NOP,4'h0};mem[14]={OP_NOP,4'h0};mem[15]={8'h01};// DATA = 1
end
Hooking It Up
The entire interface of the CPU can be defined in a file called cpu.v as a
module named cpu with a reset and clock as inputs, and an eight bit output.
The FPGA has physical pins that need to be assigned to those inputs and
outputs, so we put that in a file called top.v which instantiates the CPU
module and gives it real pins.
It also creates a clock divider by incrementing a counter every clock tick and
using a specific bit as the CPU clock (the TinyFPGA BX has a 16MHz clock, so if
we used it directly the output would be too fast to see).
PIN_13 is attached to a button for reset. CLK is an alias for the pin connected
to the onboard crystal.
After hooking up eight pins to eight LEDs, we can run the program and view the output.
7-Segment Display
But LEDs are a bit boring. It would be better with a 7-segment display like it
was on the breadboard.
At first I tried to do it in the manner that I did on the Game Boy with
successive divides but that produced so much logic it wouldn’t fit onto the
TinyFPGA BX. An example of thinking like software instead of hardware.
Eventually I came upon an algorithm called Double
Dabble that involves simple
shifting and adding, both easily done in Verilog without resulting in a
horrifying amount of logic.
The algorithm can be implemented with a for loop (sometimes best avoided in
Verilog but works well enough here).
modulebin_to_bcd(inputwire[7:0]bin,outputreg[11:0]bcd);integeri;always@(bin)beginbcd=0;for(i=0;i<8;i=i+1)beginif(bcd[3:0]>4)bcd[3:0]=bcd[3:0]+3;if(bcd[7:4]>4)bcd[7:4]=bcd[7:4]+3;if(bcd[11:8]>4)bcd[11:8]=bcd[11:8]+3;// Concatenate acts as a shift
bcd={bcd[10:0],bin[7-i]};endendendmodule
The actual display is driven by just toggling specific LEDs depending on the value of the BCD.
We have an 8-bit output register though so we need three digits to display
0-255. Rather than giving each digit its own set of output pins, we can
multiplex them (like on the breadboard version) by displaying each digit in
turn fast enough that the human eye doesn’t notice they’re flickering.
We could also go one step further and use a shift register so that we wouldn’t
need seven pins to drive seven LEDs, but I didn’t bother because I had plenty
of pins to spare.