v8cpu
v8cpu is a simple multi-cycle von Neumann architecture 8-bit CPU written in ~440 lines of Verilog. v8cpu was a project for learning purposes I developed with Icarus Verilog and Xilinx ISE WebPACK.
GitHub: https://github.com/vsergeev/v8cpu
v8cpu characteristics:
- 16 8-bit registers (R0-R15)
- 8-bit flags register (currently just two bits used: bit 0 – equals, bit 1 – greater)
- 16-bit instructions
- Arithmetic, between any two registers: add, subtract, and, or, xor, not, compare
- Branching: relative jump, jump on equal/not-equal, jump on greater/less than
- Move: register->register move, immediate->register move
- Indirect Move: memory->register and register->memory indirect moves with 16-bit address stored in R14:R15
- Indirect Jump: jump to 16-bit address stored in R14:R15
- 6 CPI for all instructions except for memory read/write instructions, which require 7 CPI
- Memory-mapped peripherals
- Two memory-mapped 8-bit output ports (0x800, 0x801)
- Two memory-mapped 8-bit input ports (0x802, 0x803)
- Easy to add other peripherals on 16-bit address / 8-bit data memory bus
It should be possible to easily squeeze the CPI down to 5/6 by simply restructuring some of the control unit state machine. In addition, a 16-bit instead of 8-bit data bus could dramatically cut down the CPI by 2 clocks. The synthesized logic targeting a Xilinx Spartan 3E XC3S500E FPGA meets timing constraints comfortably with a 80MHz clock frequency (yielding roughly 11.4 MIPS). The fully-synthesized and place & routed logic uses 215 flip-flops and 470 4-input LUTs, occupying a total of 326 slices (7% usage of the XC3S500E) and 1 RAMB16 (Embedded Block RAM for program/memory storage).
The Verilog can be targeted for simulation and FPGA. For simulation, the memory is a register array that is initialized with the program via $readmemh(). For an FPGA target, the memory is an Xilinx Embedded Block RAM instantiation (but can be any similar clocked memory) that specifies the program in the initial values with a .coe file. I’ve been using an XC3S500E FPGA on a Digilent Nexys2. For other targets (e.g. ASIC), some additional logic may be required to load the program into memory.
The v8cpu and assembler codebase can be found here: v8cpu, but is reproduced below as well for online viewing. The code base includes a description of the instruction set, the v8cpu logic in Verilog, a test bench for simulation, a two-pass assembler written in Python, and a sample test program that makes use of arithmetic/input/output to compute Fibonacci numbers and display them.
Video of Fibonacci Number Test Program
Video of the button-triggered Fibonacci number test program in action:
Instruction Set
/* v8cpu by Vanya A. Sergeev - vsergeev@gmail.com
* Simple multi-cycle von Neumann architecture 8-bit CPU */
v8cpu Instruction Set
Category
Instruction Encoding Mnemonic | Operation
MOVE
0001 0000 aaaa bbbb mov Ra, Rb | Ra <= Rb
0001 0001 aaaa xxxx mov Ra, MEM | Ra <= Memory[R14:R15]
0001 0010 aaaa xxxx mov MEM, Ra | Memory[R14:R15] <= Ra
MOVE IMMEDIATE
0010 aaaa dddd dddd mov Ra, d | Ra <= d
BRANCH (k is two's complement encoded)
0011 0000 kkkk kkkk jmp k | IP <= IP + k
0011 0001 kkkk kkkk je k | if (eq) IP <= IP + k
0011 0010 kkkk kkkk jne k | if (!eq) IP <= IP + k
0011 0011 kkkk kkkk jg k | if (greater) IP <= IP + k
0011 0100 kkkk kkkk jl k | if (!greater) IP <= IP + k
JUMP
0100 xxxx xxxx xxxx ljmp | IP <= R14:R15 << 1
MATH
0101 0000 aaaa bbbb add Ra, Rb | Ra <= Ra + Rb
0101 0001 aaaa bbbb sub Ra, Rb | Ra <= Ra - Rb
0101 0010 aaaa bbbb and Ra, Rb | Ra <= Ra & Rb
0101 0011 aaaa bbbb or Ra, Rb | Ra <= Ra | Rb
0101 0100 aaaa bbbb xor Ra, Rb | Ra <= Ra ^ Rb
0101 0101 aaaa xxxx not Ra | Ra <= ~Ra
0101 0110 aaaa bbbb cmp Ra, Rb | eq flag <= (Ra == Rb)
greater flag <= (Ra > Rb)
All other opcodes nop | Do nothing
Logic
v8cpu is contained in a single file to emphasize the small size of the CPU. The SIMULATION
define at the top can be commented/uncommented to target between simulation and FPGA.
/* v8cpu by Vanya A. Sergeev - vsergeev@gmail.com
* Simple multi-cycle von Neumann architecture 8-bit CPU
*
* 6-7 CPI, 80MHz Maximum Clock --> ~11.4 MIPS */
`define SIMULATION
/* v8cpu ALU for Add, Subtract, AND, OR, XOR, NOT, and Compare. */
module v8cpu_alu (
input [3:0] op,
input [7:0] a,
input [7:0] b,
output reg [7:0] c,
input [7:0] flags,
output reg [7:0] newFlags);
parameter ALU_OP_ADD = 4'b0000,
ALU_OP_SUB = 4'b0001,
ALU_OP_AND = 4'b0010,
ALU_OP_OR = 4'b0011,
ALU_OP_XOR = 4'b0100,
ALU_OP_NOT = 4'b0101,
ALU_OP_CMP = 4'b0110;
parameter FLAG_INDEX_EQ = 'd0,
FLAG_INDEX_GREATER = 'd1;
always @(*) begin
c = a;
newFlags = flags;
case (op)
ALU_OP_ADD: c = a + b;
ALU_OP_SUB: c = a - b;
ALU_OP_AND: c = a & b;
ALU_OP_OR: c = a | b;
ALU_OP_XOR: c = a ^ b;
ALU_OP_NOT: c = ~a;
ALU_OP_CMP: begin
newFlags[FLAG_INDEX_EQ] = (a == b);
newFlags[FLAG_INDEX_GREATER] = (a > b);
end
endcase
end
endmodule
`ifdef SIMULATION
`else
/* v8cpu Memory: 0x000-0x3FF = 1024 bytes; 8-bit data */
module v8cpu_mem (
input clk,
input we,
input [15:0] address,
input [7:0] data,
output reg [7:0] q);
wire [7:0] q_memory;
reg we_validated;
blk_mem_gen memory(.clka(clk), .wea(we_validated), .addra(address[9:0]), .dina(data), .douta(q_memory));
always @(*) begin
if (|address[15:10] == 'd0) begin
q = q_memory;
we_validated = we;
end
else begin
q = 8'bZZZZZZZZ;
we_validated = 0;
end
end
endmodule
`endif
/* v8cpu Memory-Mapped I/O: 0x800 = Port A, 0x801 = Port B, 0x803 = Pin C, 0x804 = Pin D; 8-bit data */
module v8cpu_io (
input clk,
input reset,
input we,
input [15:0] address,
input [7:0] data,
output reg [7:0] q,
output reg [7:0] portA,
output reg [7:0] portB,
input [7:0] pinC,
input [7:0] pinD);
reg [7:0] q_reg;
always @(posedge clk or negedge reset) begin
if (!reset) begin
portA <= 8'd0;
portB <= 8'd0;
end
else if (we) begin
if (address == 'h800) portA <= data;
else if (address == 'h801) portB <= data;
/* Print the current values of PortA:PortB for simulation purposes as PortA is being overwritten */
if (address == 'h800) $display("PortA:PortB = %01d", {portA, portB});
end
else begin
if (address == 'h802) q_reg <= pinC;
else if (address == 'h803) q_reg <= pinD;
end
end
always @(*) begin
if (address == 'h802) q = q_reg;
else if (address == 'h803) q = q_reg;
else q = 8'bZZZZZZZZ;
end
endmodule
/* v8cpu Control Unit: IP, 16 8-bit Registers, 8-bit Flags Register, Fetch/Decode/Execute State Machine */
module v8cpu_cu (
input clk,
input reset,
output reg [3:0] alu_op,
output reg [7:0] alu_a,
output reg [7:0] alu_b,
input [7:0] alu_c,
output [7:0] alu_flags,
input [7:0] alu_newFlags,
output reg memClk,
output reg memWE,
output reg [15:0] memAddress,
output reg [7:0] memData,
input [7:0] memQ);
/* Instruction pointer */
reg [15:0] v8CPU_IP;
/* Register file */
reg [7:0] v8CPU_RegisterFile[0:15];
/* Flags, currently just EQ flag in bit 0 */
reg [7:0] v8CPU_Flags;
/* Indexing into v8CPU_Flags for various flags modified by the compare instruction */
parameter FLAG_INDEX_EQ = 'd0,
FLAG_INDEX_GREATER = 'd1;
/* 16-bit instruction register for decoding/execution */
reg [15:0] Instruction;
/* Major classes of instructions, see v8cpu ISA */
parameter INSTR_CLASS_MOVE = 4'b0001,
INSTR_CLASS_MOVE_IMM = 4'b0010,
INSTR_CLASS_BRANCH = 4'b0011,
INSTR_CLASS_JUMP = 4'b0100,
INSTR_CLASS_MATH = 4'b0101;
/* State machine states */
reg [3:0] state;
reg [3:0] nextState;
parameter STATE_FETCH_INSTR_LO = 'b0000,
STATE_FETCH_INSTR_LO_READ = 'b0001,
STATE_FETCH_INSTR_HI = 'b0010,
STATE_FETCH_INSTR_HI_READ = 'b0011,
STATE_DECODE = 'b0100,
STATE_CLASS_MOVE = 'b0101,
STATE_CLASS_MOVE_IMM = 'b0110,
STATE_CLASS_BRANCH = 'b0111,
STATE_CLASS_JUMP = 'b1000,
STATE_CLASS_MATH = 'b1001,
STATE_CLASS_MOVE_READ_MEM_CLK = 'b1010,
STATE_CLASS_MOVE_READ_MEM = 'b1011,
STATE_CLASS_MOVE_WRITE_MEM_CLK = 'b1100,
STATE_CLASS_MOVE_WRITE_MEM = 'b1101,
STATE_CLASS_NOP = 'b1110;
/* Combinational next values for memory output regs */
reg [15:0] n_memAddress;
reg [7:0] n_memData;
reg n_memClk;
reg n_memWE;
/* Combinational next values for CPU state and instruction decoding/execution */
reg [15:0] n_v8CPU_IP;
reg [15:0] calc_n_v8CPU_IP;
reg [7:0] n_v8CPU_Flags;
reg [7:0] n_Instruction_Lo;
reg [7:0] n_Instruction_Hi;
reg [3:0] n_Register_Index;
reg [7:0] n_Register_Data;
/* Assign the flags input of the ALU directly to the v8CPU_Flags register */
assign alu_flags = v8CPU_Flags;
/* Combinational block for state machine (spelled out due to Xilinx tools bug with arrays in sensitivity list) */
always @(state or Instruction or v8CPU_IP or v8CPU_Flags or memQ or calc_n_v8CPU_IP or alu_c or alu_newFlags or v8CPU_RegisterFile[0] or v8CPU_RegisterFile[1] or v8CPU_RegisterFile[2] or v8CPU_RegisterFile[3] or v8CPU_RegisterFile[4] or v8CPU_RegisterFile[5] or v8CPU_RegisterFile[6] or v8CPU_RegisterFile[7] or v8CPU_RegisterFile[8] or v8CPU_RegisterFile[9] or v8CPU_RegisterFile[10] or v8CPU_RegisterFile[11] or v8CPU_RegisterFile[12] or v8CPU_RegisterFile[13] or v8CPU_RegisterFile[14] or v8CPU_RegisterFile[15]) begin
nextState = STATE_FETCH_INSTR_LO;
/* Default assignments */
n_memAddress = 'd0;
n_memData = 'd0;
n_memClk = 0;
n_memWE = 0;
n_Instruction_Lo = Instruction[7:0];
n_Instruction_Hi = Instruction[15:8];
n_v8CPU_IP = v8CPU_IP;
n_v8CPU_Flags = v8CPU_Flags;
n_Register_Index = 0;
n_Register_Data = v8CPU_RegisterFile[0];
alu_op = Instruction[11:8];
alu_a = v8CPU_RegisterFile[Instruction[7:4]];
alu_b = v8CPU_RegisterFile[Instruction[3:0]];
case (state)
STATE_FETCH_INSTR_LO: begin
n_memAddress = v8CPU_IP;
n_memClk = 1;
nextState = STATE_FETCH_INSTR_LO_READ;
end
STATE_FETCH_INSTR_LO_READ: begin
/* For some reason Icarus *does not* re-evaluate the
* always block sensitivity list when memQ updates.
* The #1 delay is a work-around to read in the correct value
* of memQ. */
#1 n_Instruction_Lo = memQ;
n_memAddress = v8CPU_IP+1;
nextState = STATE_FETCH_INSTR_HI;
end
STATE_FETCH_INSTR_HI: begin
n_memAddress = v8CPU_IP+1;
n_memClk = 1;
nextState = STATE_FETCH_INSTR_HI_READ;
end
STATE_FETCH_INSTR_HI_READ: begin
/* For some reason Icarus *does not* re-evaluate the
* always block sensitivity list when memQ updates.
* The #1 delay is a work-around to read in the correct value
* of memQ. */
#1 n_Instruction_Hi = memQ;
nextState = STATE_DECODE;
end
STATE_DECODE: begin
case (Instruction[15:12])
INSTR_CLASS_MOVE_IMM: nextState = STATE_CLASS_MOVE_IMM;
INSTR_CLASS_BRANCH: nextState = STATE_CLASS_BRANCH;
INSTR_CLASS_JUMP: nextState = STATE_CLASS_JUMP;
INSTR_CLASS_MATH: nextState = STATE_CLASS_MATH;
INSTR_CLASS_MOVE: begin
/* Do some additional decoding in case we need to setup the memory addresses
* for the read MEM / write MEM instructions, to keep the CPI down for memory
* access instructions. */
case (Instruction[11:8])
/* mov Ra, Rb */
'b0000: nextState = STATE_CLASS_MOVE;
/* mov Ra, MEM */
'b0001: begin
n_memAddress = {v8CPU_RegisterFile[14], v8CPU_RegisterFile[15]};
nextState = STATE_CLASS_MOVE_READ_MEM_CLK;
end
/* mov MEM, Ra */
'b0010: begin
n_memAddress = {v8CPU_RegisterFile[14], v8CPU_RegisterFile[15]};
n_memData = v8CPU_RegisterFile[Instruction[7:4]];
n_memWE = 1;
nextState = STATE_CLASS_MOVE_WRITE_MEM_CLK;
end
default: nextState = STATE_CLASS_NOP;
endcase
end
default: nextState = STATE_CLASS_NOP;
endcase
end
STATE_CLASS_BRANCH: begin
/* If the number is negative, then undo two's complement and subtract from IP */
if (Instruction[7]) calc_n_v8CPU_IP = v8CPU_IP - {8'b0000_0000, ((~Instruction[6:0])+1'b1) << 1};
/* Otherwise, if the relative jump is positive, just add to IP */
else calc_n_v8CPU_IP = v8CPU_IP + {8'b0000_0000, Instruction[6:0] << 1};
n_v8CPU_IP = v8CPU_IP+2;
case (Instruction[11:8])
/* jmp */
'b0000: n_v8CPU_IP = calc_n_v8CPU_IP;
/* je */
'b0001: if (v8CPU_Flags[FLAG_INDEX_EQ]) n_v8CPU_IP = calc_n_v8CPU_IP;
/* jne */
'b0010: if (!v8CPU_Flags[FLAG_INDEX_EQ]) n_v8CPU_IP = calc_n_v8CPU_IP;
/* jg */
'b0011: if (v8CPU_Flags[FLAG_INDEX_GREATER]) n_v8CPU_IP = calc_n_v8CPU_IP;
/* jl */
'b0100: if (!v8CPU_Flags[FLAG_INDEX_GREATER]) n_v8CPU_IP = calc_n_v8CPU_IP;
endcase
n_memAddress = n_v8CPU_IP;
nextState = STATE_FETCH_INSTR_LO;
end
STATE_CLASS_JUMP: begin
n_v8CPU_IP = ({v8CPU_RegisterFile[14], v8CPU_RegisterFile[15]} << 1);
n_memAddress = n_v8CPU_IP;
nextState = STATE_FETCH_INSTR_LO;
end
STATE_CLASS_MOVE_IMM: begin
n_Register_Index = Instruction[11:8];
n_Register_Data = Instruction[7:0];
setupFetch;
end
STATE_CLASS_MATH: begin
alu_op = Instruction[11:8];
alu_a = v8CPU_RegisterFile[Instruction[7:4]];
alu_b = v8CPU_RegisterFile[Instruction[3:0]];
n_Register_Index = Instruction[7:4];
n_Register_Data = alu_c;
n_v8CPU_Flags = alu_newFlags;
setupFetch;
end
STATE_CLASS_MOVE: begin
n_Register_Index = Instruction[7:4];
n_Register_Data = v8CPU_RegisterFile[Instruction[3:0]];
setupFetch;
end
STATE_CLASS_MOVE_READ_MEM_CLK: begin
n_memAddress = {v8CPU_RegisterFile[14], v8CPU_RegisterFile[15]};
n_memClk = 1;
nextState = STATE_CLASS_MOVE_READ_MEM;
end
STATE_CLASS_MOVE_READ_MEM: begin
n_Register_Index = Instruction[7:4];
n_Register_Data = memQ;
setupFetch;
end
STATE_CLASS_MOVE_WRITE_MEM_CLK: begin
n_memAddress = {v8CPU_RegisterFile[14], v8CPU_RegisterFile[15]};
n_memData = v8CPU_RegisterFile[Instruction[7:4]];
n_memWE = 1;
n_memClk = 1;
nextState = STATE_CLASS_MOVE_WRITE_MEM;
end
STATE_CLASS_MOVE_WRITE_MEM: begin
setupFetch;
end
STATE_CLASS_NOP: begin
setupFetch;
end
endcase
end
/* A task to increment the IP and setup the memory address to fetch the next instruction */
task setupFetch;
begin
n_v8CPU_IP = v8CPU_IP+2;
n_memAddress = v8CPU_IP+2;
nextState = STATE_FETCH_INSTR_LO;
end
endtask
integer i;
/* Sequential block for state machine */
always @(posedge clk or negedge reset) begin
if (!reset) begin
v8CPU_RegisterFile[0] <= 'd0; v8CPU_RegisterFile[1] <= 'd0;
v8CPU_RegisterFile[2] <= 'd0; v8CPU_RegisterFile[3] <= 'd0;
v8CPU_RegisterFile[4] <= 'd0; v8CPU_RegisterFile[5] <= 'd0;
v8CPU_RegisterFile[6] <= 'd0; v8CPU_RegisterFile[7] <= 'd0;
v8CPU_RegisterFile[8] <= 'd0; v8CPU_RegisterFile[9] <= 'd0;
v8CPU_RegisterFile[10] <= 'd0; v8CPU_RegisterFile[11] <= 'd0;
v8CPU_RegisterFile[12] <= 'd0; v8CPU_RegisterFile[13] <= 'd0;
v8CPU_RegisterFile[14] <= 'd0; v8CPU_RegisterFile[15] <= 'd0;
v8CPU_IP <= 16'h0000;
v8CPU_Flags <= 'd0;
state <= 'd0;
memAddress <= 'd0;
memData <= 'd0;
memClk <= 0;
memWE <= 0;
Instruction <= 'd0;
end
else begin
state <= nextState;
memAddress <= n_memAddress;
memData <= n_memData;
memClk <= n_memClk;
memWE <= n_memWE;
Instruction[15:8] <= n_Instruction_Hi;
Instruction[7:0] <= n_Instruction_Lo;
v8CPU_IP <= n_v8CPU_IP;
v8CPU_Flags <= n_v8CPU_Flags;
v8CPU_RegisterFile[n_Register_Index] <= n_Register_Data;
/* Print the CPU state for simulation purposes */
if (state == STATE_DECODE) begin
$display("IP: %08X", v8CPU_IP);
$display("Flags: %02X", v8CPU_Flags);
$display("Current Instruction: %04X", Instruction);
for (i = 0; i < 16; i = i + 1) $display("R%02d: %02X", i, v8CPU_RegisterFile[i]);
$display("-----------------------\n");
end
end
end
endmodule
/* v8cpu Top-Level Module: Clock input, Reset input, 8-bit Port A output, 8-bit Port B output, 8-bit Pin C input, 8-bit Pin D input */
module v8cpu (
input clk,
input reset,
output [7:0] portA,
output [7:0] portB,
input [7:0] pinC,
input [7:0] pinD);
wire [3:0] alu_op;
wire [7:0] alu_a;
wire [7:0] alu_b;
wire [7:0] alu_c;
wire [7:0] alu_flags;
wire [7:0] alu_newFlags;
wire memClk, memWE;
wire [15:0] memAddress;
wire [7:0] memData;
wire [7:0] memQ;
v8cpu_cu cu(.clk(clk), .reset(reset), .alu_op(alu_op), .alu_a(alu_a), .alu_b(alu_b), .alu_c(alu_c), .alu_flags(alu_flags), .alu_newFlags(alu_newFlags), .memClk(memClk), .memWE(memWE), .memAddress(memAddress), .memData(memData), .memQ(memQ));
v8cpu_alu alu(.op(alu_op), .a(alu_a), .b(alu_b), .c(alu_c), .flags(alu_flags), .newFlags(alu_newFlags));
`ifdef SIMULATION
v8cpu_mem_sim mem(.clk(memClk), .we(memWE), .address(memAddress), .data(memData), .q(memQ));
`else
v8cpu_mem mem(.clk(memClk), .we(memWE), .address(memAddress), .data(memData), .q(memQ));
`endif
v8cpu_io io(.clk(memClk), .reset(reset), .we(memWE), .address(memAddress), .data(memData), .q(memQ), .portA(portA), .portB(portB), .pinC(pinC), .pinD(pinD));
endmodule
The simulation memory module (simply an array of 8-bit registers) can be found below:
/* v8CPU Memory: 0x000-0x3FF = 1024 bytes; 8-bit data */
module v8cpu_mem_sim (
input clk,
input we,
input [15:0] address,
input [7:0] data,
output reg [7:0] q);
reg [7:0] memory[0:1023];
/* Use Verilog's $readmemh() to initialize the memory with a program for simulation purposes */
integer i;
initial begin
$readmemh("fib.dat", memory);
for (i = 0; i < 50; i = i + 1) $display("mem[%02d]: %02X", i, memory[i]);
end
always @(posedge clk) begin
if (|address[15:10] == 'd0) begin
q <= memory[address];
if (we) memory[address] <= data;
end
else q <= 8'bZZZZZZZZ;
end
endmodule
Test Bench
//`timescale 1ns/1ps
module v8cpu_tb(
output [7:0] portA,
output [7:0] portB,
input [7:0] pinC,
input [7:0] pinD);
reg clk;
reg rst;
initial begin
$dumpvars;
clk = 0;
rst = 0;
#100 rst = 1;
#100000 $finish;
end
always #20 clk = !clk;
v8cpu cpu(.clk(clk), .reset(rst), .portA(portA), .portB(portB), .pinC(pinC), .pinD(pinD));
endmodule
Assembler
The v8cpu assembler is a simple two-pass assembler written in Python.
# v8cpuasm - Two-pass assembler for v8cpu
# Vanya A. Sergeev - vsergeev@gmail.com
# Generates a memory file that can be loaded by Verilog simulator's $readmemh()
import sys
#####################################################################
# Valid operand checkers
def isOperandRegister(operand):
# Must be at least "r" and maximum 3 digits
if (len(operand) < 2 or len(operand) > 3):
return False
if (operand[0] != 'r' and operand[0] != 'R'):
return False
# Attempt to convert it
try:
value = int(operand[1:], 10)
except ValueError:
return False
# Check that it's in range
if (value < 0 or value > 15):
return False
return True
def isOperandData(operand):
# Must be at least "0x" and must be 8-bits max
if (len(operand) < 3 or len(operand) > 4):
return False
if (operand[0:2] != "0x"):
return False
# Attempt to convert it
try:
value = int(operand[2:], 16)
except ValueError:
return False
return True
def isOperandLabel(operand):
if (operand in addressLabelDict):
return True
return False
def isOperandMEM(operand):
if (operand == "MEM"):
return True
return False
#####################################################################
# Operand data extractors
def operandRegister(operand):
return int(operand[1:], 10)
def operandData(operand):
return int(operand[2:], 16)
def operandLabel(operand):
return addressLabelDict[operand]
#####################################################################
# Quick clean-up exit
def exit(retVal):
fileASM.close()
fileOut.close()
sys.exit(retVal)
#####################################################################
if (len(sys.argv) < 3):
print("Usage: %s <input assembly> <output memory dat>" % sys.argv[0])
sys.exit(0)
fileASM = open(sys.argv[1], 'r')
fileOut = open(sys.argv[2], 'w')
# Instruction and max number of operands
validInstructions = {"mov":2, "jmp":1, "je":1, "jne":1, "jg":1, "jl":1, "ljmp":0, "add":2, "sub":2, "and":2, "or":2, "xor":2, "not":2, "cmp":2, "nop":0}
IP = 0
addressLabelDict = {}
# First pass finds all of the address labels and validates the instruction mnemonics
for line in fileASM:
line = line.rstrip("\r\n")
lineClean = line.replace(',', ' ')
lineTokens = lineClean.split()
if (len(lineTokens) == 0):
continue
# Skip if this line is a comment
if (lineTokens[0][0] == ';'):
continue
# If this is an address label
if (lineTokens[0][-1] == ':'):
addressLabelDict[lineTokens[0][:-1]] = IP
# If this line only contains an address label
if (len(lineTokens) == 1):
# Don't increment the IP until we've actually seen an instruction
continue
# Make sure that if the next token is not a comment, that it is is a valid instruction mnemonic
if (lineTokens[1][0] != ';' and (not lineTokens[1] in validInstructions)):
print("Error: Unknown instruction!")
print("Line: %s" % line)
exit(-1)
# Check if this is a valid instruction
elif (not lineTokens[0] in validInstructions):
print("Error: Unknown instruction!")
print("Line: %s" % line)
exit(-1)
IP += 2
# Reset our IP
IP = 0
# Rewind the file
fileASM.seek(0)
# Second pass assembles the instructions
for line in fileASM:
line = line.rstrip("\r\n")
lineClean = line.replace(',', ' ')
lineTokens = lineClean.split()
if (len(lineTokens) == 0):
continue
# Skip if this line is a comment
if (lineTokens[0][0] == ';'):
continue
# Strip out the address label from the tokens and isolate the mnemonic
if (lineTokens[0][-1] == ':'):
# If this line only contains an address label
if (len(lineTokens) == 1):
continue
lineTokens.pop(0)
asmMnemonic = lineTokens.pop(0)
else:
asmMnemonic = lineTokens.pop(0)
# Operands are the rest of the tokens
asmOperands = lineTokens
# Strip out any comment at the end of the tokens
for i in range(len(asmOperands)):
if (asmOperands[i][0] == ';'):
asmOperands = asmOperands[:i]
break
# Check number of operands
if (len(asmOperands) < validInstructions[asmMnemonic]):
print("Error: Invalid number of operands!")
print("Line: %s" % line)
exit(-1)
if (asmMnemonic == "mov"):
# mov Ra, Rb
if (isOperandRegister(asmOperands[0]) and isOperandRegister(asmOperands[1])):
fileOut.write("%01X%01X " % (operandRegister(asmOperands[0]), operandRegister(asmOperands[1])))
fileOut.write("10")
# mov Ra, MEM
elif (isOperandRegister(asmOperands[0]) and isOperandMEM(asmOperands[1])):
fileOut.write("%01X0 " % operandRegister(asmOperands[0]))
fileOut.write("11")
# mov MEM, Ra
elif (isOperandMEM(asmOperands[0]) and isOperandRegister(asmOperands[1])):
fileOut.write("%01X0 " % operandRegister(asmOperands[1]))
fileOut.write("12")
# mov Ra, d
elif (isOperandRegister(asmOperands[0]) and isOperandData(asmOperands[1])):
fileOut.write("%02X " % operandData(asmOperands[1]))
fileOut.write("2%01X" % operandRegister(asmOperands[0]))
# Unknown operands
else:
print("Error: Invalid operands!")
print("Line: %s" % line)
exit(-1)
elif (asmMnemonic == "jmp" or asmMnemonic == "je" or asmMnemonic == "jne" or asmMnemonic == "jg" or asmMnemonic == "jl"):
# jmp k
if (isOperandLabel(asmOperands[0])):
targetIP = operandLabel(asmOperands[0])
# If the target is behind this instruction (negative relative distance)
if (targetIP < IP):
relativeDistance = (IP - targetIP) >> 1
if (relativeDistance > 127):
print("Error: Relative branch too far!")
print("Line: %s" % line)
exit(-1)
# Encode the distance with two's complement
relativeDistance = ~relativeDistance + 1
relativeDistance = relativeDistance & 0xFF
fileOut.write("%02X " % relativeDistance)
# If the target is ahead of this instruction (positive relative distance)
else:
relativeDistance = (targetIP - IP) >> 1
if (relativeDistance > 127):
print("Error: Relative branch too far!")
print("Line: %s" % line)
exit(-1)
relativeDistance = relativeDistance & 0xFF
fileOut.write("%02X " % relativeDistance)
# Unknown operands
# Encode the appropriate branch mnemonic
if (asmMnemonic == "jmp"):
fileOut.write("30")
elif (asmMnemonic == "je"):
fileOut.write("31")
elif (asmMnemonic == "jne"):
fileOut.write("32")
elif (asmMnemonic == "jg"):
fileOut.write("33")
elif (asmMnemonic == "jl"):
fileOut.write("34")
else:
print("Error: Invalid label!")
print("Line: %s" % line)
exit(-1)
elif (asmMnemonic == "ljmp"):
fileOut.write("00 40")
elif (asmMnemonic == "add" or asmMnemonic == "sub" or asmMnemonic == "and" or asmMnemonic == "or" or asmMnemonic == "xor" or asmMnemonic == "cmp"):
# <math instruction> Ra, Rb
if (isOperandRegister(asmOperands[0]) and isOperandRegister(asmOperands[1])):
fileOut.write("%01X%01X " % (operandRegister(asmOperands[0]), operandRegister(asmOperands[1])))
# Encode the appropriate math mnemonic
if (asmMnemonic == "add"):
fileOut.write("50")
elif (asmMnemonic == "sub"):
fileOut.write("51")
elif (asmMnemonic == "and"):
fileOut.write("52")
elif (asmMnemonic == "or"):
fileOut.write("53")
elif (asmMnemonic == "xor"):
fileOut.write("54")
elif (asmMnemonic == "cmp"):
fileOut.write("56")
# Unknown operands
else:
print("Error: Invalid operands!")
print("Line: %s" % line)
exit(-1)
elif (asmMnemonic == "not"):
# not Ra
if (isOperandRegister(asmOperands[0])):
fileOut.write("%01X0 " % operandRegister(asmOperands[0]))
fileOut.write("55")
# Unknown operands
else:
print("Error: Invalid operands!")
print("Line: %s" % line)
exit(-1)
elif (asmMnemonic == "nop"):
fileOut.write("00 00")
else:
print("Error: Unknown instruction!")
print("Line: %s" % line)
exit(-1)
fileOut.write("\n")
IP += 2
fileASM.close()
fileOut.close()
Fibonacci Number Test Program
This is a simple test program written in v8cpu assembly to compute 16-bit Fibonacci numbers and write them across the combined 8-bit output ports PortA:PortB (which can be connected to LEDs) each time a button pulls bit 0 of input port C high. The program code implements a ~10ms delay for debouncing between the button reads. It demonstrates v8cpu’s arithmetic, branching, and input/output capabilities.
; 16-bit Fibonacci Number Generator for v8cpu
; Vanya A. Sergeev - vsergeev@gmail.com
; Next 16-bit Fibonacci Number is computed and written across port A (high byte) and port B (low byte) each time
; a button pulls pin C.0 high.
;
; R0:R1 = Fn-2
; R2:R3 = Fn-1
; R4:R5 = Fn
; R0:R1, R2:R3 initialize to 0
; Initialize R4:R5 to 0x0001
mov R4, 0x00
mov R5, 0x01
; Save a constant 0 and constant 1
mov R10, 0x00
mov R11, 0x01
; Save constant for 46368 (biggest 16-bit fibonacci number)
mov R8, 0xB5
mov R9, 0x20
; Address for portA (0x800) in R14:R15
mov R14, 0x08
mov R15, 0x00
; Wait for the button to be depressed
buttonClrWait: mov R15, 0x02 ; R14:R15 = 0x0802 = pin C address
mov R12, MEM
and R12, R11 ; R12 & 0x1, to keep just bit 0, the button
cmp R12, R11
je buttonClrWait ; If button == 1, loop buttonClrWait
buttonDbWait: mov R15, 0x02 ; R14:R15 = 0x0802 = pin C address
mov R12, MEM
and R12, R11 ; R12 & 0x1, to keep just bit 0, the button
cmp R12, R10
je buttonDbWait ; If button == 0, loop buttonDbWait
; Button was been pressed, delay and check again for debouncing
; Outer loop is 100 loops, inner loop should take 6 CPI * 3 * 255 = 4590 clock cycles
; With a 50MHz clock this yields roughly 10ms delay
mov R12, 0x64
outerDelayLoop: mov R13, 0xFF
innerDelayLoop: sub R13, R11 ; R13 = R13 - 1
cmp R13, R10
jne innerDelayLoop ; If R13 != 0, loop innerDelayLoop
sub R12, R11 ; R12 = R12 - 1
cmp R12, R10
jne outerDelayLoop ; If R12 != 0, loop outerDelayLoop
; Check that the button is still pressed
mov R12, MEM
and R12, R11 ; R12 & 0x1, to keep just bit 0, the button
cmp R12, R10
je buttonDbWait ; If button == 0, loop buttonDbWait
; Otherwise continue to computing the next fibonacci number
mov R0, R2 ; Fn-2 (hi byte) <= Fn-1 (hi byte)
mov R1, R3 ; Fn-2 (lo byte) <= Fn-1 (lo byte)
mov R2, R4 ; Fn-1 (hi byte) <= Fn (hi byte)
mov R3, R5 ; Fn-1 (lo byte) <= Fn (lo byte)
; R4:R5 contains Fn-1, R0:R1 contains Fn-2
add R5, R1 ; Fn-1 + Fn-2 (lo bytes)
cmp R5, R3 ; Compare the new R5 to the old R5 (R3)
je fibContinue ; No carry if they're equal
jl addCarryBit ; If the new R5 is less than the old R5, we had an overflow
; and we need to add the carry bit
fibContinue: add R4, R0 ; Fn-1 + Fn-2 (hi bytes)
; Write R4 to portA, R5 to portB
mov R15, 0x00 ; R14:R15 = 0x0800 = port A address
mov MEM, R4
mov R15, 0x01 ; R14:R15 = 0x0801 = port B address
mov MEM, R5
; Check if we've reached 46368 (biggest 16-bit fibonacci number)
cmp R4, R8
jne buttonClrWait
cmp R5, R9
jne buttonClrWait
; Reset to initial conditions
mov R0, 0x00
mov R1, 0x00
mov R2, 0x00
mov R3, 0x00
mov R4, 0x00
mov R5, 0x01
jmp buttonClrWait
addCarryBit: add R4, R11 ; Hi byte += 1
jmp fibContinue
Fibonacci Number Test Program Output
This is the simulation output of the plain Fibonacci number program that sequentially prints the numbers across PortA:PortB (this is a button-less simplified version of the program above, available here: http://github.com/vsergeev/v8cpu/blob/master/v8cpuasm/programs/fib.asm).
~/projects-verilog/vcpu$ make compile simulate | grep "PortA"
PortA:PortB = 0
PortA:PortB = 1
PortA:PortB = 2
PortA:PortB = 3
PortA:PortB = 5
PortA:PortB = 8
PortA:PortB = 13
PortA:PortB = 21
PortA:PortB = 34
PortA:PortB = 55
PortA:PortB = 89
PortA:PortB = 144
PortA:PortB = 233
PortA:PortB = 377
PortA:PortB = 610
PortA:PortB = 987
PortA:PortB = 1597
PortA:PortB = 2584
PortA:PortB = 4181
PortA:PortB = 6765
PortA:PortB = 10946
PortA:PortB = 17711
PortA:PortB = 28657
PortA:PortB = 46368
PortA:PortB = 1
PortA:PortB = 2
PortA:PortB = 3
PortA:PortB = 5
PortA:PortB = 8
PortA:PortB = 13
PortA:PortB = 21
PortA:PortB = 34
PortA:PortB = 55
PortA:PortB = 89
PortA:PortB = 144
PortA:PortB = 233
PortA:PortB = 377
PortA:PortB = 610
PortA:PortB = 987
PortA:PortB = 1597
PortA:PortB = 2584
PortA:PortB = 4181
PortA:PortB = 6765
PortA:PortB = 10946
PortA:PortB = 17711
PortA:PortB = 28657
PortA:PortB = 46368
PortA:PortB = 1
PortA:PortB = 2
PortA:PortB = 3
PortA:PortB = 5
PortA:PortB = 8
PortA:PortB = 13
PortA:PortB = 21
PortA:PortB = 34
PortA:PortB = 55
PortA:PortB = 89
PortA:PortB = 144
PortA:PortB = 233
PortA:PortB = 377
PortA:PortB = 610
PortA:PortB = 987
PortA:PortB = 1597