One of the more tedious steps of building the computer is the actual assembly, going from simple circuits to more complex ones. We have decided to actually leapfrog over this step and go straight to architectural design for a few reasons:
- Circuit simulator is still a work in progress, plagued by disease and other unfortunate circumstances.
- By deciding what our architecture is going to be, we will know exactly what we need to build and how we need to connect them.
- Work is parallelized.
We have welcomed a couple additional people onto this project. User 7H3_H4CK3R has participated in most of the architectural design decisions featured on this page, and user Mego has taken over work on the circuit simulator.
As mentioned previously, we will be using an asynchronous design to speed up our processor. We will accomplish this by having a clock signal accompany the data everywhere it goes. This clock signal will serve as both read and write signals for the various memory devices (ROM, RAM, synchronizers, etc.).
There was much debate over the type of instructions that our processor would use. There were several options:
- We could use a Transport Triggered Architecture (TTA), in which case instruction would be of the form (source, destination). We would hard-wire a lot of addresses to perform specific operations upon reading/writing, similar to the Wireworld computer.
- We could have created a distinction between registers and RAM, and we could have an accumulator. Then we could have load/store commands to transfer between them. Then we could have instructions of the form (opcode, value) that could either load a certain RAM address, store a certain RAM address, or perform math, such as adding a given register to the accumulator.
- We could choose to not have registers, and just use RAM for everything. Instructions would be of the form (opcode, value, value). We could either move between RAM addresses (one of which would be mapped to the program counter), or perform math in which the result would be stored in a hard-wired default RAM address (which would act like an accumulator).
- We could avoid having a default RAM address by having instructions of the form (opcode, value, value, destination) so that a destination could be assigned to every result. This could also allow for stuff such as conditional moves.
We chose the fourth option because, the more detailed the instructions were, the fewer instructions we need, which I believe will speed up execution. Take an example of adding two numbers from RAM and storing the result in RAM. With TTA, that would require three moves (for the two operands and the result). With the third option, it might take two (add arguments, then move result) or three instructions (load arguments, store result). With the fourth option, it takes only one.
The fact that everything will be memory-mapped to RAM (including the PC), and that there are no special registers, make this design relatively simple.
It is beneficial to have several addressing modes for operands of each instruction. Our modes just involve sending the data through the RAM read cycle the desired number of times, which means they are easy to implement. These addressing modes take on slightly different meanings when used as a value (first two arguments of an instruction) or as a destination (the third argument), but the hardware is exactly the same.
- Mode 0 is any constant hardcoded into ROM. This can be either a hardcoded value, such as an instruction that always adds 5 or always writes 17, or it can be a hardcoded destination, such as an instruction that always writes to address 42.
- Mode 1 is any value that results from a single read of RAM. This would be used when we desire to add the value in RAM address 13, or if we need to read the value of address 37 to determine where our destination is.
- Mode 2 represents dereferencing. If address 7 contains a pointer to the address in which our data is stored, we would use 7 as an address to read from RAM, use that data as an address to read from RAM again, and use that data as an operand in our ALU.
- Mode 3 might not be used (or might not even exist, once we actually build it), but it would act as super-dereferencing. This might be useful for 2D arrays.
We have decided that 16-bit words are sufficient for our data, which will allow us to store values in the range 0 to 65535 for addressing or values -32768 to 32767 for numeric data using 2’s complement. Opcodes will probably be 4 bits, allowing for 16 instructions. Addressing modes will be 2 bits. This means that each instruction will be 4+2+16+2+16+2+16 = 58 bits long.
Stages of Execution
The steps involved in the execution of a single instruction are roughly as follows:
- The address of the next instruction leaves the program counter, immediately after which the program counter is incremented. This post-increment allows for “jumps” to point directly towards the desired jump destination, rather than the instruction immediately prior to it.
- The instruction address is sent to the ROM and the instruction is retrieved.
- The instruction is split into its parts, being (opcode, mode, value, mode, value, mode, destination). The opcode is stored in synchronizers until the data is ready. Each of the three arguments undergoes the following process.
- The mode and value (or destination) are sent to a counter.
- If the counter equals zero, the current value is final, so send it to the synchronizer(s) in the next steps in the process.
- Else, send the value to the read queue*, and decrement the counter. The read queue then reads the data from RAM using the value as the address, and sends the new data back to the counter. The counter cycles back to sub-step 2 above.
- Once the opcode and two values are available, the opcode is used in a selector for the ALU. It selects which basic operation to perform, sending the clock signal along one of many possible paths. The ALU returns a result.
- The conditional is evaluated to determine if the data should be written. For most opcodes (like ADD, OR, etc.) the conditional is automatically true. For special conditional commands, the first value is tested in one of two manners. We currently plan on having “
val < 0” (sign bit) and “
val != 0” tests.
- If the conditional is false, return to step 1 for the next instruction.
- Else, move the result to the write synchronizer, which also contains the desired write address from the end of step 3 above. The data is written, and we return to step 1.
*The read queue is one of the more “magical” components where I don’t really know how/if it will work. Ideally, it would do some parallelization between the three values. Worst-case scenario is that it ends up being incredibly inefficient without parellelization.
In order to speed up our processor, we will do a little bit of pipelining. In order for the computer to function nicely, we have to wait until after an instruction has written to RAM for the next instruction to read from RAM. It is possible do some overlapping, so that one instruction is being retrieved from ROM while another is being processed. This eliminates the delay that would be caused by having a program that is several thousand instructions long. On the other hand, this creates a branch delay slot, where the instruction immediately after a jump is executed regardless of the jump.