AUSTIN MORLAN

CODE CONTACT LINKEDIN RSS
Oct 11, 2021

Building an 8-Bit CPU on an FPGA


Introduction


The Breadboard Computer

I recently finished building Ben Eater’s 8-bit Breadboard Computer but near the end of it I realized that much of the fun was in the design and the learning, but the act of actually cutting wire was an exercise in tedium.

I wasn’t quite ready to put the 8-bit computer behind me, but I didn’t want to cut or bend wire anymore.

The Game Boy

I then emulated the 8bit CPU on the original Game Boy, building it from scratch in assembly. That was a fun experience as I got to learn more about the Game Boy hardware as well as writing assembly code.

The FPGA

I’ve always wanted to mess around with an FPGA but two things have stood in my way:

I then discovered the TinyFPGA BX which checks all of the boxes:

I managed to buy one from a reseller, soldered some pins on it, and put it on a breadboard.

FPGA SAP-1


The SAP-1 is a good introductory FPGA project because it’s pretty simple to implement and it can fit on a small FPGA.

All that’s needed is the TinyFPGA BX plugged into a breadboard with wires connected to a 7-segment display for showing the output. All of the counters, registers, memory, and LED outputs that are part of what make the breadboard versions so interesting to look at are completely hidden. The only thing visible to the outside world is what’s shown on the 7-segment display.

FPGA vs Software


Learning how to write code (if you can call it that) for an FPGA is a very different experience from writing code for a CPU. In this case, the Verilog code that is written for the FPGA is describing logic to create a CPU while normal programming is actually running on a CPU. There are a lot of gotchas that can confuse and mislead you if you go into it thinking it’s just another programming language.

The biggest mental block is serial vs parallel. When writing software code (usually) executes from top to bottom in a linear fashion, one statement after the next (ignoring loops and such). With an FPGA, everything is executing concurrently because you’re describing circuits full of multiple wires.

It took a bit of time to shift my mind towards this paradigm because I’m so used to writing code that runs on a CPU. I’m used to ordering statements so that if B depends on A, A is defined before B. But with Verilog the order of statements doesn’t matter because everything is evaluated as a whole.

I’m definitely not an expert at this point but I feel a little more comfortable with the HDL-way of doing things. I’d like feedback from anyone who knows this stuff well.

Describing the CPU


I was surprised to see that I was able to describe the SAP-1 in relatively few lines of Verilog. Something that took hours to connect on the breadboard or fifteen minutes to write Game Boy assembly for was just a line or two.

With Verilog you’re describing the logical behavior of a circuit and relying on the FPGA tools to synthesize your design into the gates on the FPGA that will do what you want. Similar to how you describe a program in C and trust the compiler to generate assembly for you.

Bus

The bus is a simple continuous assignment statement where its value instantly changes based on whichever control signal (if any) is currently asserted. Thus it uses an assign statement (combinational) rather than an always block (synchronous).

1
2
3
4
5
6
7
8
wire[7:0] bus;
assign bus =
	ctrl_co ? pc :
	ctrl_ro ? mem[mar] :
	ctrl_io ? ir[3:0] :
	ctrl_ao ? a_reg :
	ctrl_eo ? alu :
	8'b0;

Program Counter

The program counter is updated synchronously on the rising edge of the clock or asynchronously with the reset signal (tied to a button). If it’s reset, the counter goes to zero. If the CE control signal goes high, the counter increments, and if the JP control signal goes high then we set the PC to the value on the bus.

1
2
3
4
5
6
7
8
9
reg[3:0] pc;
always @(posedge clk or posedge reset) begin
	if (reset)
		pc <= 0;
	else if (ctrl_ce)
		pc <= pc + 1;
	else if (ctrl_jp)
		pc <= bus[3:0];
end

Instruction Step Counter

The Instruction Step Counter is very similar to the Program Counter but with some extra logic. While the PC can increment until it overflows and goes back to zero, the instruction step counter needs an explicit check for when it reaches the maximum stage of 5.

There is also the matter of halting. On the breadboard the HLT signal disables the clock output but that isn’t possible here, so instead I have it put the instruction stage counter into an infinite loop at Stage 6 from which it can never exit.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
reg[2:0] stage;
always @(posedge clk or posedge reset) begin
	if (reset)
		stage <= 0;
	else if (stage == 5 || ctrl_jp)
		stage <= 0;
	else if (ctrl_ht || stage == 6)
		// For a halt, put it into a stage it can never get out of
		stage <= 6;
	else
		stage <= stage + 1;
end

Instruction Register

The Instruction Register gets the value on the bus if the II control signal is asserted, or is reset.

1
2
3
4
5
6
7
reg[7:0] ir;
always @(posedge clk or posedge reset) begin
	if (reset)
		ir <= 0;
	else if (ctrl_ii)
		ir <= bus;
end

Memory Address Register

The MAR gets the value on the bus when the MI signal is asserted, or is reset.

1
2
3
4
5
6
7
reg[3:0] mar;
always @(posedge clk or posedge reset) begin
	if (reset)
		mar <= 0;
	else if (ctrl_mi)
		mar <= bus[3:0];
end

Memory

Memory is described easily enough as an array of registers which is fine for this project but a dedicated RAM chip would probably be a better idea for larger projects.

I found hooking up all of the memory on the breadboard to be one of the bigger pain points so being able to do it in a few simple statements was both a relief and a disappointment. Just slightly too easy.

1
2
3
4
5
reg[7:0] mem[16];
always @(posedge clk) begin
	if (ctrl_ri)
		mem[mar] <= bus;
end

Output Register

The Output Register gets the value on the bus if the OI control signal is asserted, or is reset.

1
2
3
4
5
6
always @(posedge clk or posedge reset) begin
	if (reset)
		out <= 0;
	else if (ctrl_oi)
		out <= bus;
end

ALU

The ALU has the most complicated behavior because that’s where most of the fun happens involving addition, subtraction, and flags. The B register needs its own special logic to account for the SU control signal doing twos-complement.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
reg[7:0] a_reg;
reg[7:0] b_reg;
wire[7:0] b_reg_out;
wire[8:0] alu;
wire flag_z, flag_c;
always @(posedge clk or posedge reset) begin
	if (reset)
		a_reg <= 0;
	else if (ctrl_ai)
		a_reg <= bus;
end

always @(posedge clk or posedge reset) begin
	if (reset)
		b_reg <= 0;
	else if (ctrl_bi)
		b_reg <= bus;
end

// Zero flag is set if ALU is zero
assign flag_z = (alu[7:0] == 0) ? 1 : 0;

// Use twos-complement for subtraction
assign b_reg_out = ctrl_su ? ~b_reg + 1 : b_reg;

// Carry flag is set if there's an overflow into bit 8 of the ALU
assign flag_c = alu[8];

assign alu = a_reg + b_reg_out;

Flags Register

The Flags register stores the C and Z flags from the ALU if the FI control signal is asserted, or is reset.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
parameter FLAG_C = 1;
parameter FLAG_Z = 0;

reg[1:0] flags;
always @(posedge clk or posedge reset) begin
	if (reset)
		flags <= 0;
	else if (ctrl_fi)
		flags <= {flag_c, flag_z};
end

Control Signals

Finally, the control signals aren’t complicated, they just take up a lot of code because there are sixteen of them and they depend on various combinations of stage and opcode. They change on the falling edge of the clock to make sure they’re set before any modules are active on the rising edge.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
parameter OP_NOP = 4'b0000;
parameter OP_LDA = 4'b0001;
parameter OP_ADD = 4'b0010;
parameter OP_SUB = 4'b0011;
parameter OP_STA = 4'b0100;
parameter OP_LDI = 4'b0101;
parameter OP_JMP = 4'b0110;
parameter OP_JC  = 4'b0111;
parameter OP_JZ  = 4'b1000;
parameter OP_OUT = 4'b1110;
parameter OP_HLT = 4'b1111;


// Halt
reg ctrl_ht;
always @(negedge clk) begin
	if (ir[7:4] == OP_HLT && stage == 2)
		ctrl_ht <= 1;
	else
		ctrl_ht <= 0;
end

// Memory Address Register In
reg ctrl_mi;
always @(negedge clk) begin
	if (stage == 0)
		ctrl_mi <= 1;
	else if (ir[7:4] == OP_LDA && stage == 2)
		ctrl_mi <= 1;
	else if (ir[7:4] == OP_ADD && stage == 2)
		ctrl_mi <= 1;
	else if (ir[7:4] == OP_SUB && stage == 2)
		ctrl_mi <= 1;
	else if (ir[7:4] == OP_STA && stage == 2)
		ctrl_mi <= 1;
	else
		ctrl_mi <= 0;
end

// RAM In
reg ctrl_ri;
always @(negedge clk) begin
	if (ir[7:4] == OP_STA && stage == 3)
		ctrl_ri <= 1;
	else
		ctrl_ri <= 0;
end

// RAM Out
reg ctrl_ro;
always @(negedge clk) begin
	if (stage == 1)
		ctrl_ro <= 1;
	else if (ir[7:4] == OP_LDA && stage == 3)
		ctrl_ro <= 1;
	else if (ir[7:4] == OP_ADD && stage == 3)
		ctrl_ro <= 1;
	else if (ir[7:4] == OP_SUB && stage == 3)
		ctrl_ro <= 1;
	else
		ctrl_ro <= 0;
end

// Instruction Register Out
reg ctrl_io;
always @(negedge clk) begin
	if (ir[7:4] == OP_LDA && stage == 2)
		ctrl_io <= 1;
	else if (ir[7:4] == OP_LDI && stage == 2)
		ctrl_io <= 1;
	else if (ir[7:4] == OP_ADD && stage == 2)
		ctrl_io <= 1;
	else if (ir[7:4] == OP_SUB && stage == 2)
		ctrl_io <= 1;
	else if (ir[7:4] == OP_STA && stage == 2)
		ctrl_io <= 1;
	else if (ir[7:4] == OP_JMP && stage == 2)
		ctrl_io <= 1;
	else if (ir[7:4] == OP_JC && stage == 2)
		ctrl_io <= 1;
	else if (ir[7:4] == OP_JZ && stage == 2)
		ctrl_io <= 1;
	else
		ctrl_io <= 0;
end

// Instruction Register In
reg ctrl_ii;
always @(negedge clk) begin
	if (stage == 1)
		ctrl_ii <= 1;
	else
		ctrl_ii <= 0;
end

// A Register In
reg ctrl_ai;
always @(negedge clk) begin
	if (ir[7:4] == OP_LDI && stage == 2)
		ctrl_ai <= 1;
	else if (ir[7:4] == OP_LDA && stage == 3)
		ctrl_ai <= 1;
	else if (ir[7:4] == OP_ADD && stage == 4)
		ctrl_ai <= 1;
	else if (ir[7:4] == OP_SUB && stage == 4)
		ctrl_ai <= 1;
	else
		ctrl_ai <= 0;
end

// A Register Out
reg ctrl_ao;
always @(negedge clk) begin
	if (ir[7:4] == OP_STA && stage == 3)
		ctrl_ao <= 1;
	else if (ir[7:4] == OP_OUT && stage == 2)
		ctrl_ao <= 1;
	else
		ctrl_ao <= 0;
end

// Sum Out
reg ctrl_eo;
always @(negedge clk) begin
	if (ir[7:4] == OP_ADD && stage == 4)
		ctrl_eo <= 1;
	else if (ir[7:4] == OP_SUB && stage == 4)
		ctrl_eo <= 1;
	else
		ctrl_eo <= 0;
end

// Subtract
reg ctrl_su;
always @(negedge clk) begin
	if (ir[7:4] == OP_SUB && stage == 4)
		ctrl_su <= 1;
	else
		ctrl_su <= 0;
end

// B Register In
reg ctrl_bi;
always @(negedge clk) begin
	if (ir[7:4] == OP_ADD && stage == 3)
		ctrl_bi <= 1;
	else if (ir[7:4] == OP_SUB && stage == 3)
		ctrl_bi <= 1;
	else
		ctrl_bi <= 0;
end

// Output Register In
reg ctrl_oi;
always @(negedge clk) begin
	if (ir[7:4] == OP_OUT && stage == 2)
		ctrl_oi <= 1;
	else
		ctrl_oi <= 0;
end

// Counter Enable
reg ctrl_ce;
always @(negedge clk) begin
	if (stage == 1)
		ctrl_ce <= 1;
	else
		ctrl_ce <= 0;
end

// Counter Out
reg ctrl_co;
always @(negedge clk) begin
	// Always in Stage 0
	if (stage == 0)
		ctrl_co <= 1;
	else
		ctrl_co <= 0;
end

// Jump
reg ctrl_jp;
always @(negedge clk) begin
	if (ir[7:4] == OP_JMP && stage == 2)
		ctrl_jp <= 1;
	else if (ir[7:4] == OP_JC && stage == 2 && flags[FLAG_C] == 1)
		ctrl_jp <= 1;
	else if (ir[7:4] == OP_JZ && stage == 2 && flags[FLAG_Z] == 1)
		ctrl_jp <= 1;
	else
		ctrl_jp <= 0;
end

// Flags Register In
reg ctrl_fi;
always @(negedge clk) begin
	if (ir[7:4] == OP_ADD && stage == 4)
		ctrl_fi <= 1;
	else if (ir[7:4] == OP_SUB && stage == 4)
		ctrl_fi <= 1;
	else
		ctrl_fi <= 0;
end

The Program

A program can be loaded into memory using an initial block which sets the values once upon start up.

The following is the program that runs in the video titled “Conditional jump instructions” which counts from 0 to 255 and back to 0 again repeatedly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
initial begin
	mem[0]  = {OP_OUT, 4'b0};
	mem[1]  = {OP_ADD, 4'hF};
	mem[2]  = {OP_JC,  4'h4};
	mem[3]  = {OP_JMP, 4'h0};
	mem[4]  = {OP_SUB, 4'hF};
	mem[5]  = {OP_OUT, 4'h0};
	mem[6]  = {OP_JZ,  4'h0};
	mem[7]  = {OP_JMP, 4'h4};
	mem[8]  = {OP_NOP, 4'h0};
	mem[9]  = {OP_NOP, 4'h0};
	mem[10] = {OP_NOP, 4'h0};
	mem[11] = {OP_NOP, 4'h0};
	mem[12] = {OP_NOP, 4'h0};
	mem[13] = {OP_NOP, 4'h0};
	mem[14] = {OP_NOP, 4'h0};
	mem[15] = {8'h01};        // DATA = 1
end

Hooking It Up


The entire interface of the CPU can be defined in a file called cpu.v as a module named cpu with a reset and clock as inputs, and an eight bit output.

1
2
3
4
5
module cpu(
	input wire clk,
	input wire reset,
	output reg[7:0] out
	);

The FPGA has physical pins that need to be assigned to those inputs and outputs, so we put that in a file called top.v which instantiates the CPU module and gives it real pins.

It also creates a clock divider by incrementing a counter every clock tick and using a specific bit as the CPU clock (the TinyFPGA BX has a 16MHz clock, so if we used it directly the output would be too fast to see).

PIN_13 is attached to a button for reset. CLK is an alias for the pin connected to the onboard crystal.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
module top(
	input CLK,
	input PIN_13,
	output PIN_14, output PIN_15,
	output PIN_16, output PIN_17,
	output PIN_18, output PIN_19,
	output PIN_20, output PIN_21);

reg[7:0] out;
reg[23:0] clk;
always @(posedge CLK)
	clk <= clk + 1;

cpu cpu0(
	.clk(clk[15]),
	.reset(PIN_13),
	.out(out));

After hooking up eight pins to eight LEDs, we can run the program and view the output.

7-Segment Display


But LEDs are a bit boring. It would be better with a 7-segment display like it was on the breadboard.

At first I tried to do it in the manner that I did on the Game Boy with successive divides but that produced so much logic it wouldn’t fit onto the TinyFPGA BX. An example of thinking like software instead of hardware.

Eventually I came upon an algorithm called Double Dabble that involves simple shifting and adding, both easily done in Verilog without resulting in a horrifying amount of logic.

The algorithm can be implemented with a for loop (sometimes best avoided in Verilog but works well enough here).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
module bin_to_bcd(
	input wire[7:0] bin,
	output reg[11:0] bcd);

integer i;

always @(bin) begin
	bcd = 0;

	for (i = 0; i < 8; i = i+1) begin
		if (bcd[3:0] > 4)
			bcd[3:0] = bcd[3:0] + 3;

		if (bcd[7:4] > 4)
			bcd[7:4] = bcd[7:4] + 3;

		if (bcd[11:8] > 4)
			bcd[11:8] = bcd[11:8] + 3;

		// Concatenate acts as a shift
		bcd = {bcd[10:0], bin[7-i]};
	end
end

endmodule

The actual display is driven by just toggling specific LEDs depending on the value of the BCD.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
module seven_seg(
	input wire[3:0] bcd,
	output wire[6:0] segments
);

assign segments =
	//		        ABCDEFG
	(bcd == 0) ? 7'b1111110 :
	(bcd == 1) ? 7'b0110000 :
	(bcd == 2) ? 7'b1101101 :
	(bcd == 3) ? 7'b1111001 :
	(bcd == 4) ? 7'b0110011 :
	(bcd == 5) ? 7'b1011011 :
	(bcd == 6) ? 7'b1011111 :
	(bcd == 7) ? 7'b1110000 :
	(bcd == 8) ? 7'b1111111 :
	(bcd == 9) ? 7'b1110011 :
	7'b0000000;

endmodule

We have an 8-bit output register though so we need three digits to display 0-255. Rather than giving each digit its own set of output pins, we can multiplex them (like on the breadboard version) by displaying each digit in turn fast enough that the human eye doesn’t notice they’re flickering.

We could also go one step further and use a shift register so that we wouldn’t need seven pins to drive seven LEDs, but I didn’t bother because I had plenty of pins to spare.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
reg[3:0] cathode = 4'b1110;
reg[6:0] seg_ones;
reg[6:0] seg_tens;
reg[6:0] seg_hundreds;
wire[11:0] bcd;

bin_to_bcd bin_to_bcd0(out, bcd);

seven_seg seven_seg_ones(
	.bcd(bcd[3:0]),
	.segments(seg_ones));

seven_seg seven_seg_tens(
	.bcd(bcd[7:4]),
	.segments(seg_tens));

seven_seg seven_seg_hundreds(
	.bcd(bcd[11:8]),
	.segments(seg_hundreds));

always @(posedge clk[10])
	case (cathode)
		4'b1110: begin
			cathode = 4'b1011;
			{PIN_11, PIN_9, PIN_15, PIN_18, PIN_19, PIN_10, PIN_14} = seg_hundreds;
		end
		4'b1011: begin
			cathode = 4'b1101;
			{PIN_11, PIN_9, PIN_15, PIN_18, PIN_19, PIN_10, PIN_14} = seg_tens;
		end
		4'b1101: begin
			cathode = 4'b1110;
			{PIN_11, PIN_9, PIN_15, PIN_18, PIN_19, PIN_10, PIN_14} = seg_ones;
		end
		default: begin
			cathode = 4'b1111;
		end
	endcase

assign {PIN_20, PIN_17, PIN_16, PIN_12} = cathode;

Result


Finally we have something comparable to the breadboard (but much less exciting because we can’t see a bunch of toggling LEDs).

Source Code


The source code is here.