Last modified 2017-10-18 22:23:32 CDT

v8cpu

v8cpu is a simple multi-cycle von Neumann architecture 8-bit CPU written in ~440 lines of Verilog. v8cpu was a project for learning purposes I developed with Icarus Verilog and Xilinx ISE WebPACK.

GitHub: https://github.com/vsergeev/v8cpu

v8cpu characteristics:

16 8-bit registers (R0-R15)
8-bit flags register (currently just two bits used: bit 0 – equals, bit 1 – greater)
16-bit instructions
- Arithmetic, between any two registers: add, subtract, and, or, xor, not, compare
- Branching: relative jump, jump on equal/not-equal, jump on greater/less than
- Move: register->register move, immediate->register move
- Indirect Move: memory->register and register->memory indirect moves with 16-bit address stored in R14:R15
- Indirect Jump: jump to 16-bit address stored in R14:R15
6 CPI for all instructions except for memory read/write instructions, which require 7 CPI
Memory-mapped peripherals
- Two memory-mapped 8-bit output ports (0x800, 0x801)
- Two memory-mapped 8-bit input ports (0x802, 0x803)
- Easy to add other peripherals on 16-bit address / 8-bit data memory bus

It should be possible to easily squeeze the CPI down to 5/6 by simply restructuring some of the control unit state machine. In addition, a 16-bit instead of 8-bit data bus could dramatically cut down the CPI by 2 clocks. The synthesized logic targeting a Xilinx Spartan 3E XC3S500E FPGA meets timing constraints comfortably with a 80MHz clock frequency (yielding roughly 11.4 MIPS). The fully-synthesized and place & routed logic uses 215 flip-flops and 470 4-input LUTs, occupying a total of 326 slices (7% usage of the XC3S500E) and 1 RAMB16 (Embedded Block RAM for program/memory storage).

The Verilog can be targeted for simulation and FPGA. For simulation, the memory is a register array that is initialized with the program via $readmemh(). For an FPGA target, the memory is an Xilinx Embedded Block RAM instantiation (but can be any similar clocked memory) that specifies the program in the initial values with a .coe file. I’ve been using an XC3S500E FPGA on a Digilent Nexys2. For other targets (e.g. ASIC), some additional logic may be required to load the program into memory.

The v8cpu and assembler codebase can be found here: v8cpu, but is reproduced below as well for online viewing. The code base includes a description of the instruction set, the v8cpu logic in Verilog, a test bench for simulation, a two-pass assembler written in Python, and a sample test program that makes use of arithmetic/input/output to compute Fibonacci numbers and display them.

Video of Fibonacci Number Test Program

Video of the button-triggered Fibonacci number test program in action:

Instruction Set

/* v8cpu by Vanya A. Sergeev - vsergeev@gmail.com
 * Simple multi-cycle von Neumann architecture 8-bit CPU */

v8cpu Instruction Set

Category
Instruction Encoding    Mnemonic    | Operation

MOVE
0001 0000 aaaa bbbb mov Ra, Rb      | Ra <= Rb
0001 0001 aaaa xxxx mov Ra, MEM     | Ra <= Memory[R14:R15]
0001 0010 aaaa xxxx mov MEM, Ra     | Memory[R14:R15] <= Ra

MOVE IMMEDIATE
0010 aaaa dddd dddd mov Ra, d       | Ra <= d

BRANCH (k is two's complement encoded)
0011 0000 kkkk kkkk jmp k           | IP <= IP + k
0011 0001 kkkk kkkk je k            | if (eq) IP <= IP + k
0011 0010 kkkk kkkk jne k           | if (!eq) IP <= IP + k
0011 0011 kkkk kkkk jg k            | if (greater) IP <= IP + k
0011 0100 kkkk kkkk jl k            | if (!greater) IP <= IP + k

JUMP
0100 xxxx xxxx xxxx ljmp            | IP <= R14:R15 << 1

MATH
0101 0000 aaaa bbbb add Ra, Rb      | Ra <= Ra + Rb
0101 0001 aaaa bbbb sub Ra, Rb      | Ra <= Ra - Rb
0101 0010 aaaa bbbb and Ra, Rb      | Ra <= Ra & Rb
0101 0011 aaaa bbbb or Ra, Rb       | Ra <= Ra | Rb
0101 0100 aaaa bbbb xor Ra, Rb      | Ra <= Ra ^ Rb
0101 0101 aaaa xxxx not Ra          | Ra <= ~Ra
0101 0110 aaaa bbbb cmp Ra, Rb      | eq flag <= (Ra == Rb)
                                      greater flag <= (Ra > Rb)

All other opcodes   nop             | Do nothing

Logic

v8cpu is contained in a single file to emphasize the small size of the CPU. The SIMULATION define at the top can be commented/uncommented to target between simulation and FPGA.

/* v8cpu by Vanya A. Sergeev - vsergeev@gmail.com
 * Simple multi-cycle von Neumann architecture 8-bit CPU
 *
 * 6-7 CPI, 80MHz Maximum Clock --> ~11.4 MIPS */

`define SIMULATION

/* v8cpu ALU for Add, Subtract, AND, OR, XOR, NOT, and Compare. */
module v8cpu_alu (
    input [3:0] op,
    input [7:0] a,
    input [7:0] b,
    output reg [7:0] c,
    input [7:0] flags,
    output reg [7:0] newFlags);

    parameter   ALU_OP_ADD  = 4'b0000,
                ALU_OP_SUB  = 4'b0001,
                ALU_OP_AND  = 4'b0010,
                ALU_OP_OR   = 4'b0011,
                ALU_OP_XOR  = 4'b0100,
                ALU_OP_NOT  = 4'b0101,
                ALU_OP_CMP  = 4'b0110;
    
    parameter   FLAG_INDEX_EQ       = 'd0,
                FLAG_INDEX_GREATER  = 'd1;

    always @(*) begin
        c = a;
        newFlags = flags;
        case (op)
            ALU_OP_ADD: c = a + b;
            ALU_OP_SUB: c = a - b;
            ALU_OP_AND: c = a & b;
            ALU_OP_OR: c = a | b;
            ALU_OP_XOR: c = a ^ b;
            ALU_OP_NOT: c = ~a;
            ALU_OP_CMP: begin
                newFlags[FLAG_INDEX_EQ] = (a == b);
                newFlags[FLAG_INDEX_GREATER] = (a > b);
            end
        endcase
    end
endmodule

`ifdef SIMULATION
`else
/* v8cpu Memory: 0x000-0x3FF = 1024 bytes; 8-bit data */
module v8cpu_mem (
    input clk,
    input we,
    input [15:0] address,
    input [7:0] data,
    output reg [7:0] q);
    
    wire [7:0] q_memory;
    reg we_validated;
    
    blk_mem_gen memory(.clka(clk), .wea(we_validated), .addra(address[9:0]), .dina(data), .douta(q_memory));

    always @(*) begin
        if (|address[15:10] == 'd0) begin
            q = q_memory;
            we_validated = we;
        end
        else begin
            q = 8'bZZZZZZZZ;
            we_validated = 0;
        end
    end
endmodule
`endif

/* v8cpu Memory-Mapped I/O: 0x800 = Port A, 0x801 = Port B, 0x803 = Pin C, 0x804 = Pin D; 8-bit data */
module v8cpu_io (
    input clk,
    input reset,
    input we,
    input [15:0] address,
    input [7:0] data,
    output reg [7:0] q,

    output reg [7:0] portA,
    output reg [7:0] portB,
    input [7:0] pinC,
    input [7:0] pinD);

    reg [7:0] q_reg;

    always @(posedge clk or negedge reset) begin
        if (!reset) begin
            portA <= 8'd0;
            portB <= 8'd0;
        end
        else if (we) begin
            if (address == 'h800) portA <= data;
            else if (address == 'h801) portB <= data;
            /* Print the current values of PortA:PortB for simulation purposes as PortA is being overwritten */
            if (address == 'h800) $display("PortA:PortB = %01d", {portA, portB});
        end
        else begin
            if (address == 'h802) q_reg <= pinC;
            else if (address == 'h803) q_reg <= pinD;
        end
    end

    always @(*) begin
        if (address == 'h802) q = q_reg;
        else if (address == 'h803) q = q_reg;
        else q = 8'bZZZZZZZZ;
    end
endmodule

/* v8cpu Control Unit: IP, 16 8-bit Registers, 8-bit Flags Register, Fetch/Decode/Execute State Machine */
module v8cpu_cu (
    input clk,
    input reset,

    output reg [3:0] alu_op,    
    output reg [7:0] alu_a,
    output reg [7:0] alu_b,
    input [7:0] alu_c,
    output [7:0] alu_flags,
    input [7:0] alu_newFlags,

    output reg memClk,
    output reg memWE,
    output reg [15:0] memAddress,
    output reg [7:0] memData,
    input [7:0] memQ);

    /* Instruction pointer */
    reg [15:0] v8CPU_IP;
    /* Register file */
    reg [7:0] v8CPU_RegisterFile[0:15];
    /* Flags, currently just EQ flag in bit 0 */
    reg [7:0] v8CPU_Flags;

    /* Indexing into v8CPU_Flags for various flags modified by the compare instruction */
    parameter   FLAG_INDEX_EQ       = 'd0,
                FLAG_INDEX_GREATER  = 'd1;

    /* 16-bit instruction register for decoding/execution */
    reg [15:0] Instruction;

    /* Major classes of instructions, see v8cpu ISA */
    parameter   INSTR_CLASS_MOVE        = 4'b0001,
                INSTR_CLASS_MOVE_IMM    = 4'b0010,
                INSTR_CLASS_BRANCH      = 4'b0011,
                INSTR_CLASS_JUMP        = 4'b0100,
                INSTR_CLASS_MATH        = 4'b0101;

    /* State machine states */
    reg [3:0] state;
    reg [3:0] nextState;

    parameter   STATE_FETCH_INSTR_LO            = 'b0000,
                STATE_FETCH_INSTR_LO_READ       = 'b0001,
                STATE_FETCH_INSTR_HI            = 'b0010,
                STATE_FETCH_INSTR_HI_READ       = 'b0011,
                STATE_DECODE                    = 'b0100,
                STATE_CLASS_MOVE                = 'b0101,
                STATE_CLASS_MOVE_IMM            = 'b0110,
                STATE_CLASS_BRANCH              = 'b0111,
                STATE_CLASS_JUMP                = 'b1000,
                STATE_CLASS_MATH                = 'b1001,
                STATE_CLASS_MOVE_READ_MEM_CLK   = 'b1010,
                STATE_CLASS_MOVE_READ_MEM       = 'b1011,
                STATE_CLASS_MOVE_WRITE_MEM_CLK  = 'b1100,
                STATE_CLASS_MOVE_WRITE_MEM      = 'b1101,
                STATE_CLASS_NOP                 = 'b1110;

    /* Combinational next values for memory output regs */
    reg [15:0] n_memAddress;
    reg [7:0] n_memData;
    reg n_memClk;
    reg n_memWE;

    /* Combinational next values for CPU state and instruction decoding/execution */
    reg [15:0] n_v8CPU_IP;
    reg [15:0] calc_n_v8CPU_IP;
    reg [7:0] n_v8CPU_Flags;
    reg [7:0] n_Instruction_Lo;
    reg [7:0] n_Instruction_Hi;
    reg [3:0] n_Register_Index;
    reg [7:0] n_Register_Data;

    /* Assign the flags input of the ALU directly to the v8CPU_Flags register */
    assign alu_flags = v8CPU_Flags;
    
    /* Combinational block for state machine (spelled out due to Xilinx tools bug with arrays in sensitivity list) */ 
    always @(state or Instruction or v8CPU_IP or v8CPU_Flags or memQ or calc_n_v8CPU_IP or alu_c or alu_newFlags or v8CPU_RegisterFile[0] or v8CPU_RegisterFile[1] or v8CPU_RegisterFile[2] or v8CPU_RegisterFile[3] or v8CPU_RegisterFile[4] or v8CPU_RegisterFile[5] or v8CPU_RegisterFile[6] or v8CPU_RegisterFile[7] or v8CPU_RegisterFile[8] or v8CPU_RegisterFile[9] or v8CPU_RegisterFile[10] or v8CPU_RegisterFile[11] or v8CPU_RegisterFile[12] or v8CPU_RegisterFile[13] or v8CPU_RegisterFile[14] or v8CPU_RegisterFile[15]) begin
        nextState = STATE_FETCH_INSTR_LO;

        /* Default assignments */
        n_memAddress = 'd0;
        n_memData = 'd0;
        n_memClk = 0;
        n_memWE = 0;

        n_Instruction_Lo = Instruction[7:0];
        n_Instruction_Hi = Instruction[15:8];
        n_v8CPU_IP = v8CPU_IP;
        n_v8CPU_Flags = v8CPU_Flags;

        n_Register_Index = 0;
        n_Register_Data = v8CPU_RegisterFile[0];
        
        alu_op = Instruction[11:8];
        alu_a = v8CPU_RegisterFile[Instruction[7:4]];
        alu_b = v8CPU_RegisterFile[Instruction[3:0]];

        case (state)
            STATE_FETCH_INSTR_LO: begin
                n_memAddress = v8CPU_IP;
                n_memClk = 1;
                nextState = STATE_FETCH_INSTR_LO_READ;
            end
            STATE_FETCH_INSTR_LO_READ: begin
                /* For some reason Icarus *does not* re-evaluate the
                 * always block sensitivity list when memQ updates.
                 * The #1 delay is a work-around to read in the correct value
                 * of memQ. */
                #1 n_Instruction_Lo = memQ;
                n_memAddress = v8CPU_IP+1;
                nextState = STATE_FETCH_INSTR_HI;
            end
            STATE_FETCH_INSTR_HI: begin
                n_memAddress = v8CPU_IP+1;
                n_memClk = 1;
                nextState = STATE_FETCH_INSTR_HI_READ;
            end
            STATE_FETCH_INSTR_HI_READ: begin
                /* For some reason Icarus *does not* re-evaluate the
                 * always block sensitivity list when memQ updates.
                 * The #1 delay is a work-around to read in the correct value
                 * of memQ. */
                #1 n_Instruction_Hi = memQ;
                nextState = STATE_DECODE;
            end
            STATE_DECODE: begin
                case (Instruction[15:12])
                    INSTR_CLASS_MOVE_IMM: nextState = STATE_CLASS_MOVE_IMM;
                    INSTR_CLASS_BRANCH: nextState = STATE_CLASS_BRANCH;
                    INSTR_CLASS_JUMP: nextState = STATE_CLASS_JUMP;
                    INSTR_CLASS_MATH: nextState = STATE_CLASS_MATH;
                    INSTR_CLASS_MOVE: begin
                        /* Do some additional decoding in case we need to setup the memory addresses
                         * for the read MEM / write MEM instructions, to keep the CPI down for memory
                         * access instructions. */
                        case (Instruction[11:8])
                            /* mov Ra, Rb */
                            'b0000: nextState = STATE_CLASS_MOVE;
                            /* mov Ra, MEM */
                            'b0001: begin
                                n_memAddress = {v8CPU_RegisterFile[14], v8CPU_RegisterFile[15]};
                                nextState = STATE_CLASS_MOVE_READ_MEM_CLK;
                            end
                            /* mov MEM, Ra */
                            'b0010: begin
                                n_memAddress = {v8CPU_RegisterFile[14], v8CPU_RegisterFile[15]};
                                n_memData = v8CPU_RegisterFile[Instruction[7:4]];
                                n_memWE = 1;
                                nextState = STATE_CLASS_MOVE_WRITE_MEM_CLK;
                            end
                            default: nextState = STATE_CLASS_NOP;
                        endcase
                    end
                    default: nextState = STATE_CLASS_NOP;
                endcase
            end

            STATE_CLASS_BRANCH: begin
                /* If the number is negative, then undo two's complement and subtract from IP */
                if (Instruction[7]) calc_n_v8CPU_IP = v8CPU_IP - {8'b0000_0000, ((~Instruction[6:0])+1'b1) << 1};
                /* Otherwise, if the relative jump is positive, just add to IP */
                else calc_n_v8CPU_IP = v8CPU_IP + {8'b0000_0000, Instruction[6:0] << 1};

                n_v8CPU_IP = v8CPU_IP+2;
                case (Instruction[11:8])
                    /* jmp */
                    'b0000: n_v8CPU_IP = calc_n_v8CPU_IP;
                    /* je */
                    'b0001: if (v8CPU_Flags[FLAG_INDEX_EQ]) n_v8CPU_IP = calc_n_v8CPU_IP;
                    /* jne */
                    'b0010: if (!v8CPU_Flags[FLAG_INDEX_EQ]) n_v8CPU_IP = calc_n_v8CPU_IP;
                    /* jg */
                    'b0011: if (v8CPU_Flags[FLAG_INDEX_GREATER]) n_v8CPU_IP = calc_n_v8CPU_IP;
                    /* jl */
                    'b0100: if (!v8CPU_Flags[FLAG_INDEX_GREATER]) n_v8CPU_IP = calc_n_v8CPU_IP;
                endcase
                n_memAddress = n_v8CPU_IP;
                nextState = STATE_FETCH_INSTR_LO;
            end

            STATE_CLASS_JUMP: begin
                n_v8CPU_IP = ({v8CPU_RegisterFile[14], v8CPU_RegisterFile[15]} << 1);
                n_memAddress = n_v8CPU_IP;
                nextState = STATE_FETCH_INSTR_LO;
            end
            
            STATE_CLASS_MOVE_IMM: begin
                n_Register_Index = Instruction[11:8];
                n_Register_Data = Instruction[7:0];                
                setupFetch;
            end
            
            STATE_CLASS_MATH: begin
                alu_op = Instruction[11:8];
                alu_a = v8CPU_RegisterFile[Instruction[7:4]];
                alu_b = v8CPU_RegisterFile[Instruction[3:0]];
                n_Register_Index = Instruction[7:4];
                n_Register_Data = alu_c;
                n_v8CPU_Flags = alu_newFlags;
                setupFetch;
            end

            STATE_CLASS_MOVE: begin
                n_Register_Index = Instruction[7:4];
                n_Register_Data = v8CPU_RegisterFile[Instruction[3:0]];
                setupFetch;
            end

            STATE_CLASS_MOVE_READ_MEM_CLK: begin
                n_memAddress = {v8CPU_RegisterFile[14], v8CPU_RegisterFile[15]};
                n_memClk = 1;
                nextState = STATE_CLASS_MOVE_READ_MEM;
            end

            STATE_CLASS_MOVE_READ_MEM: begin
                n_Register_Index = Instruction[7:4];
                n_Register_Data = memQ;
                setupFetch;
            end

            STATE_CLASS_MOVE_WRITE_MEM_CLK: begin
                n_memAddress = {v8CPU_RegisterFile[14], v8CPU_RegisterFile[15]};
                n_memData = v8CPU_RegisterFile[Instruction[7:4]];
                n_memWE = 1;
                n_memClk = 1;
                nextState = STATE_CLASS_MOVE_WRITE_MEM;
            end

            STATE_CLASS_MOVE_WRITE_MEM: begin
                setupFetch;
            end

            STATE_CLASS_NOP: begin
                setupFetch;
            end
        endcase
    end

    /* A task to increment the IP and setup the memory address to fetch the next instruction */    
    task setupFetch;
    begin
        n_v8CPU_IP = v8CPU_IP+2;
        n_memAddress = v8CPU_IP+2;
        nextState = STATE_FETCH_INSTR_LO;
    end
    endtask

    integer i;
    /* Sequential block for state machine */
    always @(posedge clk or negedge reset) begin
        if (!reset) begin
            v8CPU_RegisterFile[0] <= 'd0; v8CPU_RegisterFile[1] <= 'd0;
            v8CPU_RegisterFile[2] <= 'd0; v8CPU_RegisterFile[3] <= 'd0;
            v8CPU_RegisterFile[4] <= 'd0; v8CPU_RegisterFile[5] <= 'd0;
            v8CPU_RegisterFile[6] <= 'd0; v8CPU_RegisterFile[7] <= 'd0;
            v8CPU_RegisterFile[8] <= 'd0; v8CPU_RegisterFile[9] <= 'd0;
            v8CPU_RegisterFile[10] <= 'd0; v8CPU_RegisterFile[11] <= 'd0;
            v8CPU_RegisterFile[12] <= 'd0; v8CPU_RegisterFile[13] <= 'd0;
            v8CPU_RegisterFile[14] <= 'd0; v8CPU_RegisterFile[15] <= 'd0;
            v8CPU_IP <= 16'h0000;
            v8CPU_Flags <= 'd0;
            state <= 'd0;
            memAddress <= 'd0;
            memData <= 'd0;
            memClk <= 0;
            memWE <= 0;
            Instruction <= 'd0;
        end
        else begin
            state <= nextState;
            memAddress <= n_memAddress;
            memData <= n_memData;
            memClk <= n_memClk;
            memWE <= n_memWE;

            Instruction[15:8] <= n_Instruction_Hi;
            Instruction[7:0] <= n_Instruction_Lo;
            v8CPU_IP <= n_v8CPU_IP;
            v8CPU_Flags <= n_v8CPU_Flags;
            v8CPU_RegisterFile[n_Register_Index] <= n_Register_Data;

            /* Print the CPU state for simulation purposes */
            if (state == STATE_DECODE) begin
                $display("IP: %08X", v8CPU_IP);
                $display("Flags: %02X", v8CPU_Flags);
                $display("Current Instruction: %04X", Instruction);
                for (i = 0; i < 16; i = i + 1) $display("R%02d: %02X", i, v8CPU_RegisterFile[i]);
                $display("-----------------------\n");
            end
        end
    end
endmodule

/* v8cpu Top-Level Module: Clock input, Reset input, 8-bit Port A output, 8-bit Port B output, 8-bit Pin C input, 8-bit Pin D input */
module v8cpu (
    input clk,
    input reset,
    output [7:0] portA,
    output [7:0] portB,
    input [7:0] pinC,
    input [7:0] pinD);
    
    wire [3:0] alu_op;
    wire [7:0] alu_a;
    wire [7:0] alu_b;
    wire [7:0] alu_c;
    wire [7:0] alu_flags;
    wire [7:0] alu_newFlags;
    wire memClk, memWE;
    wire [15:0] memAddress;
    wire [7:0] memData;
    wire [7:0] memQ;

    v8cpu_cu cu(.clk(clk), .reset(reset), .alu_op(alu_op), .alu_a(alu_a), .alu_b(alu_b), .alu_c(alu_c), .alu_flags(alu_flags), .alu_newFlags(alu_newFlags), .memClk(memClk), .memWE(memWE), .memAddress(memAddress), .memData(memData), .memQ(memQ));

    v8cpu_alu alu(.op(alu_op), .a(alu_a), .b(alu_b), .c(alu_c), .flags(alu_flags), .newFlags(alu_newFlags));

    `ifdef SIMULATION
    v8cpu_mem_sim mem(.clk(memClk), .we(memWE), .address(memAddress), .data(memData), .q(memQ));
    `else
    v8cpu_mem mem(.clk(memClk), .we(memWE), .address(memAddress), .data(memData), .q(memQ));
    `endif

    v8cpu_io io(.clk(memClk), .reset(reset), .we(memWE), .address(memAddress), .data(memData), .q(memQ), .portA(portA), .portB(portB), .pinC(pinC), .pinD(pinD));
endmodule

The simulation memory module (simply an array of 8-bit registers) can be found below:

/* v8CPU Memory: 0x000-0x3FF = 1024 bytes; 8-bit data */
module v8cpu_mem_sim (
    input clk,
    input we,
    input [15:0] address,
    input [7:0] data,
    output reg [7:0] q);
    
    reg [7:0] memory[0:1023];

    /* Use Verilog's $readmemh() to initialize the memory with a program for simulation purposes */
    integer i;
    initial begin
        $readmemh("fib.dat", memory);
        for (i = 0; i < 50; i = i + 1) $display("mem[%02d]: %02X", i, memory[i]);
    end

    always @(posedge clk) begin
        if (|address[15:10] == 'd0) begin
            q <= memory[address];
            if (we) memory[address] <= data;
        end
        else q <= 8'bZZZZZZZZ;
    end
endmodule

Test Bench

//`timescale 1ns/1ps

module v8cpu_tb(
    output [7:0] portA,
    output [7:0] portB,
    input [7:0] pinC,
    input [7:0] pinD);

    reg clk;
    reg rst;
    
    initial begin
        $dumpvars;

        clk = 0;
        rst = 0;

        #100 rst = 1;
        #100000 $finish;
    end

    always #20 clk = !clk;

    v8cpu cpu(.clk(clk), .reset(rst), .portA(portA), .portB(portB), .pinC(pinC), .pinD(pinD));
endmodule

Assembler

The v8cpu assembler is a simple two-pass assembler written in Python.

# v8cpuasm - Two-pass assembler for v8cpu
# Vanya A. Sergeev - vsergeev@gmail.com
# Generates a memory file that can be loaded by Verilog simulator's $readmemh()

import sys

#####################################################################

# Valid operand checkers

def isOperandRegister(operand):
    # Must be at least "r" and maximum 3 digits
    if (len(operand) < 2 or len(operand) > 3):
        return False
    if (operand[0] != 'r' and operand[0] != 'R'):
        return False
    # Attempt to convert it
    try:
        value = int(operand[1:], 10)
    except ValueError:
        return False
    # Check that it's in range
    if (value < 0 or value > 15):
        return False

    return True

def isOperandData(operand):
    # Must be at least "0x" and must be 8-bits max
    if (len(operand) < 3 or len(operand) > 4):
        return False
    if (operand[0:2] != "0x"):
        return False
    # Attempt to convert it
    try:
        value = int(operand[2:], 16)
    except ValueError:
        return False

    return True

def isOperandLabel(operand):
    if (operand in addressLabelDict):
        return True
    return False

def isOperandMEM(operand):
    if (operand == "MEM"):
        return True
    return False

#####################################################################

# Operand data extractors

def operandRegister(operand):
    return int(operand[1:], 10)

def operandData(operand):
    return int(operand[2:], 16)

def operandLabel(operand):
    return addressLabelDict[operand]

#####################################################################

# Quick clean-up exit
def exit(retVal):
    fileASM.close()
    fileOut.close()
    sys.exit(retVal)

#####################################################################

if (len(sys.argv) < 3):
    print("Usage: %s <input assembly> <output memory dat>" % sys.argv[0])
    sys.exit(0)

fileASM = open(sys.argv[1], 'r')
fileOut = open(sys.argv[2], 'w')

# Instruction and max number of operands
validInstructions = {"mov":2, "jmp":1, "je":1, "jne":1, "jg":1, "jl":1, "ljmp":0, "add":2, "sub":2, "and":2, "or":2, "xor":2, "not":2, "cmp":2, "nop":0}
IP = 0
addressLabelDict = {}

# First pass finds all of the address labels and validates the instruction mnemonics
for line in fileASM:
    line = line.rstrip("\r\n")
    lineClean = line.replace(',', ' ')
    lineTokens = lineClean.split()
    
    if (len(lineTokens) == 0):
        continue

    # Skip if this line is a comment
    if (lineTokens[0][0] == ';'):
        continue

    # If this is an address label
    if (lineTokens[0][-1] == ':'):
        addressLabelDict[lineTokens[0][:-1]] = IP
        # If this line only contains an address label
        if (len(lineTokens) == 1):
            # Don't increment the IP until we've actually seen an instruction
            continue
        # Make sure that if the next token is not a comment, that it is is a valid instruction mnemonic
        if (lineTokens[1][0] != ';' and (not lineTokens[1] in validInstructions)):
            print("Error: Unknown instruction!")
            print("Line: %s" % line)
            exit(-1)
    # Check if this is a valid instruction
    elif (not lineTokens[0] in validInstructions):
        print("Error: Unknown instruction!")
        print("Line: %s" % line)
        exit(-1)

    IP += 2


# Reset our IP
IP = 0

# Rewind the file
fileASM.seek(0)

# Second pass assembles the instructions
for line in fileASM:
    line = line.rstrip("\r\n")
    lineClean = line.replace(',', ' ')
    lineTokens = lineClean.split()
    
    if (len(lineTokens) == 0):
        continue

    # Skip if this line is a comment
    if (lineTokens[0][0] == ';'):
        continue
    
    # Strip out the address label from the tokens and isolate the mnemonic
    if (lineTokens[0][-1] == ':'):
        # If this line only contains an address label
        if (len(lineTokens) == 1):
            continue
        lineTokens.pop(0)
        asmMnemonic = lineTokens.pop(0)
    else:
        asmMnemonic = lineTokens.pop(0)
    
    # Operands are the rest of the tokens
    asmOperands = lineTokens

    # Strip out any comment at the end of the tokens
    for i in range(len(asmOperands)):
        if (asmOperands[i][0] == ';'):
            asmOperands = asmOperands[:i]
            break

    # Check number of operands
    if (len(asmOperands) < validInstructions[asmMnemonic]):
        print("Error: Invalid number of operands!")
        print("Line: %s" % line)
        exit(-1)
    
    if (asmMnemonic == "mov"):
        # mov Ra, Rb
        if (isOperandRegister(asmOperands[0]) and isOperandRegister(asmOperands[1])):
            fileOut.write("%01X%01X " % (operandRegister(asmOperands[0]), operandRegister(asmOperands[1])))
            fileOut.write("10")
        # mov Ra, MEM
        elif (isOperandRegister(asmOperands[0]) and isOperandMEM(asmOperands[1])):
            fileOut.write("%01X0 " % operandRegister(asmOperands[0]))
            fileOut.write("11")
        # mov MEM, Ra
        elif (isOperandMEM(asmOperands[0]) and isOperandRegister(asmOperands[1])):
            fileOut.write("%01X0 " % operandRegister(asmOperands[1]))
            fileOut.write("12")
        # mov Ra, d
        elif (isOperandRegister(asmOperands[0]) and isOperandData(asmOperands[1])):
            fileOut.write("%02X " % operandData(asmOperands[1]))
            fileOut.write("2%01X" % operandRegister(asmOperands[0]))
        # Unknown operands
        else:
            print("Error: Invalid operands!")
            print("Line: %s" % line)
            exit(-1)

    elif (asmMnemonic == "jmp" or asmMnemonic == "je" or asmMnemonic == "jne" or asmMnemonic == "jg" or asmMnemonic == "jl"):
        # jmp k
        if (isOperandLabel(asmOperands[0])):
            targetIP = operandLabel(asmOperands[0])
            # If the target is behind this instruction (negative relative distance)
            if (targetIP < IP):
                relativeDistance = (IP - targetIP) >> 1
                if (relativeDistance > 127):
                    print("Error: Relative branch too far!")
                    print("Line: %s" % line)
                    exit(-1)
                # Encode the distance with two's complement
                relativeDistance = ~relativeDistance + 1
                relativeDistance = relativeDistance & 0xFF
                fileOut.write("%02X " % relativeDistance)

            # If the target is ahead of this instruction (positive relative distance)
            else:
                relativeDistance = (targetIP - IP) >> 1
                if (relativeDistance > 127):
                    print("Error: Relative branch too far!")
                    print("Line: %s" % line)
                    exit(-1)
                relativeDistance = relativeDistance & 0xFF
                fileOut.write("%02X " % relativeDistance)
                # Unknown operands

            # Encode the appropriate branch mnemonic
            if (asmMnemonic == "jmp"):
                fileOut.write("30")
            elif (asmMnemonic == "je"):
                fileOut.write("31")
            elif (asmMnemonic == "jne"):
                fileOut.write("32")
            elif (asmMnemonic == "jg"):
                fileOut.write("33")
            elif (asmMnemonic == "jl"):
                fileOut.write("34")
        else:
            print("Error: Invalid label!")
            print("Line: %s" % line)
            exit(-1)

    elif (asmMnemonic == "ljmp"):
        fileOut.write("00 40")    

    elif (asmMnemonic == "add" or asmMnemonic == "sub" or asmMnemonic == "and" or asmMnemonic == "or" or asmMnemonic == "xor" or asmMnemonic == "cmp"):
        # <math instruction> Ra, Rb
        if (isOperandRegister(asmOperands[0]) and isOperandRegister(asmOperands[1])):
            fileOut.write("%01X%01X " % (operandRegister(asmOperands[0]), operandRegister(asmOperands[1])))
            # Encode the appropriate math mnemonic
            if (asmMnemonic == "add"):
                fileOut.write("50")
            elif (asmMnemonic == "sub"):
                fileOut.write("51")
            elif (asmMnemonic == "and"):
                fileOut.write("52")
            elif (asmMnemonic == "or"):
                fileOut.write("53")
            elif (asmMnemonic == "xor"):
                fileOut.write("54")
            elif (asmMnemonic == "cmp"):
                fileOut.write("56")
        # Unknown operands
        else:
            print("Error: Invalid operands!")
            print("Line: %s" % line)
            exit(-1)

    elif (asmMnemonic == "not"):
        # not Ra
        if (isOperandRegister(asmOperands[0])):
            fileOut.write("%01X0 " % operandRegister(asmOperands[0]))
            fileOut.write("55")
        # Unknown operands
        else:
            print("Error: Invalid operands!")
            print("Line: %s" % line)
            exit(-1)

    elif (asmMnemonic == "nop"):
        fileOut.write("00 00")

    else:
        print("Error: Unknown instruction!")
        print("Line: %s" % line)
        exit(-1)
    
    fileOut.write("\n")    
    IP += 2

fileASM.close()
fileOut.close()

Fibonacci Number Test Program

This is a simple test program written in v8cpu assembly to compute 16-bit Fibonacci numbers and write them across the combined 8-bit output ports PortA:PortB (which can be connected to LEDs) each time a button pulls bit 0 of input port C high. The program code implements a ~10ms delay for debouncing between the button reads. It demonstrates v8cpu’s arithmetic, branching, and input/output capabilities.

; 16-bit Fibonacci Number Generator for v8cpu
; Vanya A. Sergeev - vsergeev@gmail.com
; Next 16-bit Fibonacci Number is computed and written across port A (high byte) and port B (low byte) each time
; a button pulls pin C.0 high.
;
; R0:R1 = Fn-2
; R2:R3 = Fn-1
; R4:R5 = Fn
; R0:R1, R2:R3 initialize to 0

; Initialize R4:R5 to 0x0001
mov R4, 0x00
mov R5, 0x01

; Save a constant 0 and constant 1
mov R10, 0x00
mov R11, 0x01

; Save constant for 46368 (biggest 16-bit fibonacci number)
mov R8, 0xB5
mov R9, 0x20

; Address for portA (0x800) in R14:R15
mov R14, 0x08
mov R15, 0x00

; Wait for the button to be depressed
buttonClrWait:	mov R15, 0x02	; R14:R15 = 0x0802 = pin C address
		mov R12, MEM
		and R12, R11	; R12 & 0x1, to keep just bit 0, the button
		cmp R12, R11
		je buttonClrWait	; If button == 1, loop buttonClrWait

buttonDbWait:	mov R15, 0x02	; R14:R15 = 0x0802 = pin C address
		mov R12, MEM
		and R12, R11	; R12 & 0x1, to keep just bit 0, the button
		cmp R12, R10
		je buttonDbWait	; If button == 0, loop buttonDbWait

		; Button was been pressed, delay and check again for debouncing
		; Outer loop is 100 loops, inner loop should take 6 CPI * 3 * 255 = 4590 clock cycles
		; With a 50MHz clock this yields roughly 10ms delay

		mov R12, 0x64

outerDelayLoop:	mov R13, 0xFF
innerDelayLoop:	sub R13, R11	; R13 = R13 - 1
		cmp R13, R10
		jne innerDelayLoop	; If R13 != 0, loop innerDelayLoop
		sub R12, R11	; R12 = R12 - 1
		cmp R12, R10
		jne outerDelayLoop	; If R12 != 0, loop outerDelayLoop


		; Check that the button is still pressed

		mov R12, MEM
		and R12, R11	; R12 & 0x1, to keep just bit 0, the button
		cmp R12, R10
		je buttonDbWait	; If button == 0, loop buttonDbWait

		; Otherwise continue to computing the next fibonacci number

		mov R0, R2	; Fn-2 (hi byte) <= Fn-1 (hi byte)
		mov R1, R3	; Fn-2 (lo byte) <= Fn-1 (lo byte)

		mov R2, R4	; Fn-1 (hi byte) <= Fn (hi byte)
		mov R3, R5	; Fn-1 (lo byte) <= Fn (lo byte)

		; R4:R5 contains Fn-1, R0:R1 contains Fn-2

		add R5, R1	; Fn-1 + Fn-2 (lo bytes)
		cmp R5, R3	; Compare the new R5 to the old R5 (R3)
		je fibContinue	; No carry if they're equal
		jl addCarryBit	; If the new R5 is less than the old R5, we had an overflow
				; and we need to add the carry bit
fibContinue:	add R4, R0	; Fn-1 + Fn-2 (hi bytes)

		; Write R4 to portA, R5 to portB
		mov R15, 0x00	; R14:R15 = 0x0800 = port A address
		mov MEM, R4
		mov R15, 0x01	; R14:R15 = 0x0801 = port B address
		mov MEM, R5

		; Check if we've reached 46368 (biggest 16-bit fibonacci number)
		cmp R4, R8
		jne buttonClrWait
		cmp R5, R9
		jne buttonClrWait

		; Reset to initial conditions
		mov R0, 0x00
		mov R1, 0x00
		mov R2, 0x00
		mov R3, 0x00
		mov R4, 0x00
		mov R5, 0x01
		jmp buttonClrWait


addCarryBit:	add R4, R11	; Hi byte += 1
		jmp fibContinue

Fibonacci Number Test Program Output

This is the simulation output of the plain Fibonacci number program that sequentially prints the numbers across PortA:PortB (this is a button-less simplified version of the program above, available here: http://github.com/vsergeev/v8cpu/blob/master/v8cpuasm/programs/fib.asm).

~/projects-verilog/vcpu$ make compile simulate | grep "PortA"
PortA:PortB = 0
PortA:PortB = 1
PortA:PortB = 2
PortA:PortB = 3
PortA:PortB = 5
PortA:PortB = 8
PortA:PortB = 13
PortA:PortB = 21
PortA:PortB = 34
PortA:PortB = 55
PortA:PortB = 89
PortA:PortB = 144
PortA:PortB = 233
PortA:PortB = 377
PortA:PortB = 610
PortA:PortB = 987
PortA:PortB = 1597
PortA:PortB = 2584
PortA:PortB = 4181
PortA:PortB = 6765
PortA:PortB = 10946
PortA:PortB = 17711
PortA:PortB = 28657
PortA:PortB = 46368
PortA:PortB = 1
PortA:PortB = 2
PortA:PortB = 3
PortA:PortB = 5
PortA:PortB = 8
PortA:PortB = 13
PortA:PortB = 21
PortA:PortB = 34
PortA:PortB = 55
PortA:PortB = 89
PortA:PortB = 144
PortA:PortB = 233
PortA:PortB = 377
PortA:PortB = 610
PortA:PortB = 987
PortA:PortB = 1597
PortA:PortB = 2584
PortA:PortB = 4181
PortA:PortB = 6765
PortA:PortB = 10946
PortA:PortB = 17711
PortA:PortB = 28657
PortA:PortB = 46368
PortA:PortB = 1
PortA:PortB = 2
PortA:PortB = 3
PortA:PortB = 5
PortA:PortB = 8
PortA:PortB = 13
PortA:PortB = 21
PortA:PortB = 34
PortA:PortB = 55
PortA:PortB = 89
PortA:PortB = 144
PortA:PortB = 233
PortA:PortB = 377
PortA:PortB = 610
PortA:PortB = 987
PortA:PortB = 1597