AUSTIN MORLAN

ABOUT CONTACT RSS
Jan 03, 2023

Building an FPGA Computer: SAP-1



I’ve been getting into FPGAs lately. Last year I built an FPGA version of Ben Eater’s breadboard computer, but I’ve been wanting to do a more advanced project to help me gain experience with Verilog and FPGAs in general.

For his breadboard computer, Ben Eater followed the design laid out in a book called Digital Computer Electronics by Malvino and Brown. The book builds what it calls the Simple-as-Possible (SAP) Computer. It starts with the incredibly simple SAP-1, adds some features to get the SAP-2, and then adds a few more to reach the final version called SAP-3.

Ben Eater’s version in the videos was mostly a SAP-1 with a few added instructions. I recreated his version essentially bit-for-bit in my FPGA implementation so it was also basically the SAP-1, but that was my first exposure to Verilog and FPGAs so my implementation was not the best: I had everything in one file instead of discrete modules, and I didn’t simulate anything to verify correctness.

I thought it might be a good experience to rebuild the SAP-1 using the Verilog skills I’ve learned since last year, but then also progressing to the SAP-2 and finishing with the SAP-3. I also thought it might be interesting to drive a display as well.

I made some minor changes to the version in the book where I thought it added clarity (e.g., making logic levels always active-high, giving signals more descriptive names, and removing the entire input/output system), but for the most part everything is the same as the book.

Overview


Modules


Bus

The bus is how all of the modules send data between themselves. When one module needs to send data to another, it puts it on the bus. When one module needs to receive data from another, it reads it from the bus. All is coordinated by certain signals being asserted at certain times: a load signal reads from the bus and an enable signal outputs onto the bus.

The bus is nothing more than wires that go between every component in the computer. For this reason I defined it as a simple 8-bit wire. It’s eight bits wide because it’s an 8-bit computer. All data operations occur in units of eight bits.

The bus is used in modules in three different ways. If the module only reads from the bus, it’s marked as input. If the module only writes to the bus, it’s marked as output. If the module sometimes reads from the bus (load) and sometimes writes to the bus (enable), it’s marked as inout.

1
wire[7:0] bus;

Clock

A computer can’t do anything without a clock. It’s the maestro that orchestrates all of the distinct components so that they can talk together at a fixed interval in lock-step with each other. A clock oscillates between HIGH and LOW repeatedly, until the end of time (or until power is removed). Without a clock there would be chaos. The clock is the beating heart of the computer.

I could have used the pre-defined CLK pin that is connected to the FPGA’s internal 16MHz clock, but instead I created a discrete clock module in its own Verilog file so I could control it with signals.

It has an input called clk_in and an output called clk_out. The output is always a copy of the input unless hlt is asserted in which case the output is just zero. That’s used later as part of the HLT instruction to stop computer execution.

If a program doesn’t need to be executed indefinitely then the final instruction can be HLT to stop all further execution. The easiest way to stop a computer from doing something is to stop its clock.

1
2
3
4
5
6
7
8
module clock(
	input hlt,
	input clk_in,
	output clk_out);

assign clk_out = (hlt) ? 1'b0 : clk_in;

endmodule

Program Counter

If you think of the clock as the maestro keeping the orchestra in time, and the orchestra as all of the individual components of the computer, then the Program Counter (PC) is the page of the music that everyone is playing. It always stores the address of the next instruction to be executed.

For the SAP-1, a program is just a series of bytes in memory where one byte makes up one instruction to be executed. The instructions are laid out serially and counted through starting from address 0. The SAP-1’s memory is only 16 bytes so the program counter should count from 0x0 (0) to 0xF (15).

The PC is defined as a four-bit reg to enable counting from address 0 to address 15. A reg differs from a wire in that it can store a value between clock ticks. The value in the PC needs to persist between clock pulses so it has to be defined as a reg.

If the clock goes high and inc is asserted, the value in the PC is incremented by one, otherwise it stays the same. If clr is asserted then it resets back to zero.

Regardless of the clock, if en is asserted then the PC is placed onto the bus. Otherwise it puts a high-impedance (Z) value onto the bus, which is neither high nor low, allowing other modules to drive the bus without conflict.

It’s assumed that only module will ever be driving the bus at any one time. To drive it from multiple modules would cause undefined behavior. The control signals shown later ensure that never happens.

The bus is 8 bits wide and the PC is 4 bits wide, so the PC goes into the lower four bits of the bus.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
module pc(
	input clk,
	input clr,
	input inc,
	input en,
	output[7:0] bus
);

reg[3:0] pc = 0;

always @(posedge clk or posedge clr) begin
	if (clr) begin
		pc <= 4'b0;
	end else if (inc) begin
		pc <= pc + 1;
	end
end

assign bus = (en) ? pc : 8'bz;

endmodule

Register A

What good is a computer if it can’t store anything?

Registers are a computer’s way of storing data. Their size is normally dependent on the architecture of the computer: an 8-bit computer will have 8-bit registers, a 32-bit computer will have 32-bit registers, a 64-bit computer will have 64-bit registers, and so on. The computer is 8-bit so its registers are 8-bit as well.

Register A is the main register of the computer and many of the instructions depend upon it. Its internals look similar to some of the things seen previously: a clk, a load, an en, a bus. As expected, load sets the register’s value based on the bus and en places the register’s value onto the bus.

There is also the output called val which is set to whatever is in reg_a. This is so its value can be easily fed into the adder module for arithmetic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
module reg_a(
	input clk,
	input load,
	input en,
	inout[7:0] bus,
	output[7:0] val);

reg[7:0] reg_a = 0;

always @(posedge clk) begin
	if (load) begin
		reg_a <= bus;
	end
end

assign bus = (en) ? reg_a : 8'bz;
assign val = reg_a;

endmodule

Register B

Register B is very similar to Register A except that its value cannot be output to the bus; i.e., it has no en signal. The SAP-1 is designed so that Register A is where the main action occurs and Register B supports it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
module reg_b(
	input clk,
	input load,
	input[7:0] bus,
	output[7:0] val);

reg[7:0] reg_b = 0;

always @(posedge clk) begin
	if (load) begin
		reg_b <= bus;
	end
end

assign val = reg_b;

endmodule

Adder

Computers were originally designed to help us do a lot of math very quickly so it makes sense that this humble computer should be able to do math also. The SAP-1 can only do addition and subtraction.

The two registers (A and B) are where all of the math operations occur: A + B or A - B. The arithmetic module is called the Adder even though it also does subtraction (subtraction is just addition of a negative number after all).

If en is asserted then either addition or subtraction occurs, depending on whether sub is asserted. Otherwise the bus gets a high-Z value.

Notice the lack of a clock signal. The adder is constantly calculating either addition or subtraction based on the values in a and b. It’s only when en is asserted that the value is actually placed onto the bus, but it’s always being calculated.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
module adder(
	input[7:0] a,
	input[7:0] b,
	input sub,
	input en,
	output[7:0] bus);

assign bus = 
	(en) ? 
		((sub) ? a-b : a+b) :
		8'bz;

endmodule

Memory

The SAP-1 has 16 bytes of memory which is small enough that it can be defined directly inside of the FPGA. With larger amounts of memory (possibly kilobytes or megabytes), it would be better to use an actual RAM chip as an external memory module.

There is a 4-bit register called the Memory Address Register (MAR) which is used to store a memory address. The SAP-1 takes two clock cycles to read from memory: one cycle loads an address from the bus into the MAR (using the load signal) and the second cycle uses the value in the MAR to address into ram and output that value onto the bus, if en is asserted. If en isn’t asserted then the bus is again set to high-Z.

The initial block is used to initialize the memory by loading its contents from a file which is an easy way to set the memory. The file has sixteen lines where each line represents a byte of memory.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
module memory(
	input clk,
	input load,
	input en,
	inout[7:0] bus
);

initial begin
	$readmemh("program.bin", ram);
end

reg[3:0] mar = 0;
reg[7:0] ram[0:15];

always @(posedge clk) begin
	if (load) begin
		mar <= bus[3:0];
	end
end

assign bus = (en) ? ram[mar] : 8'bz;

endmodule

Instruction Register

The program counter contains the memory address to read the next instruction from, and the memory itself contains the instructions at that address, but there also needs to be a way to actually get the instruction from memory.

Before an instruction can be interpreted and acted upon, it needs to be loaded from memory into a module that can separate the opcode from the data. That’s the job of the Instruction Register (IR).

An instruction has two components: the upper four bits are the opcode and the lower four bits are the operand. Some instructions use an operand and some don’t in which case it will be ignored.

The clr and load signals do what they’ve done in other modules but if en is asserted then the operand (lower four bits) goes onto the bus. The opcode (upper four bits) are always assigned to instr which is used later to interpret what operation is being performed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
module ir(
	input clk,
	input clr,
	input load,
	input en,
	inout[7:0] bus,
	output[3:0] instr);

reg[7:0] ir = 0;

always @(posedge clk or posedge clr) begin
	if (clr) begin
		ir <= 8'b0;
	end else if (load) begin
		ir <= bus;
	end
end

assign instr = ir[7:4];
assign bus = (en) ? ir[3:0] : 8'bz;

endmodule

Controller

The controller is the most complicated part of the computer and is where all of the interesting stuff happens. It decides what the computer will do next by asserting the different control signals that have gone into each of the modules.

Those control signals are:

The controller module controls the behavior of the computer by asserting those signals at different times according to different stimuli.

Instruction execution occurs in a series of stages where each stage takes one clock cycle. The SAP-1 has six stages, starting at Stage 1 and counting to Stage 6, at which point it returns back to Stage 1 again. It continues on like that forever with every tick of the clock: 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, etc.

There is a 3-bit stage register (allowing values from 0 to 7) and with each tick of the clock the stage increases by one. Once it hits 6, it goes back to 1. It changes stage on the negative clock edge so that the signals will be set up properly before the modules need them on the next positive clock edge.

instr is passed from the IR into the controller module to do different things based on what instruction is currently executing. What it does depends on the instruction and the stage of execution.

Finally the output of the controller is the twelve control signals listed above. Different stages of different instructions will assert different signals to accomplish different things.

Rather than pass the signals in individually, I pass them all in a single 12-bit value where each bit represents one of the signals. That keeps the code cleaner and makes it easier to set all the bits to zero before setting the ones that need to be asserted at that time.

The SAP-1 has four operations that it can perform:

The values in the brackets represent the opcode and all but HLT have an operand. LDA, for example, has the opcode 0000 and its operand is the address of the value to be loaded into A.

Every instruction has the same first three stages which fetch the next instruction from memory based on the current value in the PC.

After the first three stages, the actions performed during the next three differ depending on the instruction, and some of the instructions do nothing at all.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
module controller(
	input clk,
	input[3:0] instr,
	output reg[11:0] ctrl_word);

localparam SIG_HLT       = 11;
localparam SIG_PC_INC    = 10;
localparam SIG_PC_EN     = 9;
localparam SIG_MEM_LOAD  = 8;
localparam SIG_MEM_EN    = 7;
localparam SIG_IR_LOAD   = 6;
localparam SIG_IR_EN     = 5;
localparam SIG_A_LOAD    = 4;
localparam SIG_A_EN      = 3;
localparam SIG_B_LOAD    = 2;
localparam SIG_ADDER_SUB = 1;
localparam SIG_ADDER_EN  = 0;

localparam OP_LDA = 4'b0000;
localparam OP_ADD = 4'b0001;
localparam OP_SUB = 4'b0010;
localparam OP_HLT = 4'b1111;

reg[2:0] stage = 0;

always @(negedge clk) begin
	if (stage == 6) begin
		stage <= 1;
	end else begin
		stage <= stage + 1;
	end
end

always @(*) begin
	case (stage)
		1: begin
			ctrl_word = 12'b0;
			ctrl_word[SIG_PC_EN] = 1;
			ctrl_word[SIG_MEM_LOAD] = 1;
		end
		2: begin
			ctrl_word = 12'b0;
			ctrl_word[SIG_PC_INC] = 1;
		end
		3: begin
			ctrl_word = 12'b0;
			ctrl_word[SIG_MEM_EN] = 1;
			ctrl_word[SIG_IR_LOAD] = 1;
		end
		4: begin
			case (instr)
				OP_LDA: begin
					ctrl_word = 12'b0;
					ctrl_word[SIG_IR_EN] = 1;
					ctrl_word[SIG_MEM_LOAD] = 1;
				end
				OP_ADD: begin
					ctrl_word = 12'b0;
					ctrl_word[SIG_IR_EN] = 1;
					ctrl_word[SIG_MEM_LOAD] = 1;
				end
				OP_SUB: begin
					ctrl_word = 12'b0;
					ctrl_word[SIG_IR_EN] = 1;
					ctrl_word[SIG_MEM_LOAD] = 1;
				end
				OP_HLT: begin
					ctrl_word = 12'b0;
					ctrl_word[SIG_HLT] = 1;
				end
				default: begin
					ctrl_word = 12'b0;
				end
			endcase
		end
		5: begin
			case (instr)
				OP_LDA: begin
					ctrl_word = 12'b0;
					ctrl_word[SIG_MEM_EN] = 1;
					ctrl_word[SIG_A_LOAD] = 1;
				end
				OP_ADD: begin
					ctrl_word = 12'b0;
					ctrl_word[SIG_MEM_EN] = 1;
					ctrl_word[SIG_B_LOAD] = 1;
				end
				OP_SUB: begin
					ctrl_word = 12'b0;
					ctrl_word[SIG_MEM_EN] = 1;
					ctrl_word[SIG_B_LOAD] = 1;
				end
				default: begin
					ctrl_word = 12'b0;
				end
			endcase
		end
		6: begin
			case (instr)
				OP_ADD: begin
					ctrl_word = 12'b0;
					ctrl_word[SIG_ADDER_EN] = 1;
					ctrl_word[SIG_A_LOAD] = 1;
				end
				OP_SUB: begin
					ctrl_word = 12'b0;
					ctrl_word[SIG_ADDER_SUB] = 1;
					ctrl_word[SIG_ADDER_EN] = 1;
					ctrl_word[SIG_A_LOAD] = 1;
				end
				default: begin
					ctrl_word = 12'b0;
				end
			endcase
		end
		default: begin
			ctrl_word = 12'b0;
		end
	endcase
end

endmodule

Simulation


It’s always good to test a design in a testbench which allows for verifying correctness before uploading it to the FPGA. It’s much easier to catch and debug issues in simulation then it is once it’s on the actual FPGA.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
module top_tb();

initial begin
	$dumpfile("top_tb.vcd");
	$dumpvars(0, top_tb);
end

reg clk_in = 0;
integer i;
initial begin
	for (i = 0; i < 128; i++) begin
		#1 clk_in = ~clk_in;
	end
end

wire clk;
wire hlt;
wire clr;
wire[7:0] bus;

clock clock(
	.hlt(hlt),
	.clk_in(clk_in),
	.clk_out(clk)
);

wire pc_en;
wire pc_inc;
pc pc(
	.clk(clk),
	.clr(clr),
	.inc(pc_inc),
	.en(pc_en),
	.bus(bus)
);

wire a_load;
wire a_en;
wire[7:0] a_val;
reg_a reg_a(
	.clk(clk),
	.load(a_load),
	.en(a_en),
	.bus(bus),
	.val(a_val)
);

wire b_load;
wire[7:0] b_val;
reg_b reg_b(
	.clk(clk),
	.load(b_load),
	.bus(bus),
	.val(b_val)
);

wire adder_sub;
wire adder_en;
adder adder(
	.a(a_val),
	.b(b_val),
	.sub(adder_sub),
	.en(adder_en),
	.bus(bus)
);

wire mar_load;
wire mem_en;
memory mem(
	.clk(clk),
	.load(mar_load),
	.en(mem_en),
	.bus(bus)
);

wire ir_load;
wire ir_en;
wire[3:0] ir_instr;
ir ir(
	.clk(clk),
	.clr(clr),
	.load(ir_load),
	.en(ir_en),
	.bus(bus),
	.instr(ir_instr)
);

controller controller(
	.clk(clk),
	.instr(ir_instr),
	.ctrl_word(
	{
		hlt,
		pc_inc,
		pc_en,
		mar_load,
		mem_en,
		ir_load,
		ir_en,
		a_load,
		a_en,
		b_load,
		adder_sub,
		adder_en
	})
);

endmodule

It instantiates all of the modules in the computer and connects them to each other. The initial block at the beginning runs once at the start of simulation to create a file called top_tb.vcd which contains all of the simulation data.

initial blocks aren’t synthesizable; they’re purely used for testing. In this case, one is used as a way of simulating a clock by toggling clk_in 128 times. With each iteration of the loop, clk_in is set to its inverse ~clk_in. So its state will be 0 -> 1 -> 0 -> 1 -> 0 -> 1, etc.

The #1 is a time delay, which again is non-synthesizable and used only for testing. The test waits one time unit, toggles the clock, loops, waits one time unit, toggles the clock, loops, 128 times.

So what’s the point? You can generate a simulation file and load it into a tool called gtkwave which will let you view the signals at different times.

After configuring it a bit to nicely show the signals that are important, it displays this:

Here is the (annotated) test program:

$0 |   0D  // LDA [$D]   Load A with the value at address $D
$1 |   1E  // ADD [$E]   Add the value at address $E to A
$2 |   2F  // SUB [$F]   Subtract the value at address $F from A
$3 |   F0  // HLT        Stop execution
$4 |   00
$5 |   00
$6 |   00
$7 |   00
$8 |   00 
$9 |   00 
$A |   00 
$B |   00
$C |   00
$D |   03  // Data: 3
$E |   04  // Data: 4
$F |   02  // Data: 2

Load the value at $D (3) into A, add to it the value in $E (4), subtract from it the value in $F (2), and then halt. At the end of execution the value in Register A should be 3+4-2=5. If you look at the location of the red marker, the clock is no longer ticking and the value in A is indeed 5.

It’s also useful (and neat) to trace the signals throughout time and see as each individual instruction executes: clock toggling, values going on and off the bus, register values changing, stages incrementing, control signals being asserted, etc.

Source Code


You can find all of the code here.



Last Edited: Jan 15, 2023