Processors are as slow as their slowest component. While true in conventional CPUs, that phrase takes on a whole new meaning in cellular automata. At any given moment, there are lots of little pieces of data flowing around in our computer, and in order for two pieces to properly interact, the selected pieces of data must be perfectly in sync with each other. In the previous post, I mentioned the choice between constant- and variable-timed ROM. That is actually a small part of a large decision: synchronous or asynchronous?
- Synchronous: All of the data in the computer is kept in sync simply through wiring. All reads/writes have equal time, and all operations have equal time. Everything is as slow as the worst-case scenario (reading from furthest address, etc.). We don’t need to design any parts that can handle inputs at variable times, but we’ll need lots of delay wire to sync stuff up.
- Asynchronous: The next instruction is executed as soon as the current one is finished. ROM/RAM data is available as soon as it is read, with smaller addresses being faster. Arithmetic is performed once both operands are available. We can make circuits shorter/faster, but need components that are capable of synchronizing pieces of data with each other, such as holding the data for some arbitrary amount of time until it is ready to be used.
An asynchronous design has the potential to cut average ROM/RAM access times in half. Given that the ROM may be several thousand tiles long, this will cut several thousand generations off each instruction’s execution time. Furthermore, it will reduce the complexity of wiring. This is why we have chosen the asynchronous paradigm for our processor.
The current plan to implement asynchronous computing is to have a second wire running parallel to each data wire. This wire will carry a “clock signal” that indicates the presence of data. For example, this clock signal will serve as the read/write signals to memory. The clock signal will always remain in sync with the data, but the path (and duration) of travel can vary between instructions. The computer will speed up when small addresses are being accessed, and will slow down for larger addresses.
There are, however, many times where two pieces of data must be put in sync with each other. This can work by storing one piece in a temporary “register,” which is read from when the second piece of data arrives. In order for an asynchronous design to work, this synchronization device has to be able to handle the two pieces of data coming in at any time (in practice, any multiple of 11 with some minimum separation between writing and reading).
Here’s how our synchronizer works: first, the incoming signal is sent through a serial-to-parallel converter, which works similarly to one in our ROM. It’s actually a more “true” converter: the data signal is sent down a sequence of wire splitters separated by delays, and then the “clock signal” passes across each wire and selects a single bit for each of the outgoing wires.
The core of the synchronizer is a set of length-44 loops, one for each bit of the word. These loops function similarly to the ones that might be in our RAM. As an example of using a loop to store information, below is an image of a memory loop constructed by El’endia:
There are four main components:
- An AND-NOT gate that uses the write signal to destroy the information currently in the loop (which will happen in about 15 generation in this sim).
- An OR gate which which adds the new data to the loop.
- A wire splitter that repeatedly outputs the data stored in the loop.
- An AND gate that is hooked up the the read signal to allow the data to exit the device (which will also happen in this sim).
In the synchronizer, the loop must be able to be read from at any given time (multiple of 11). So, we must have four electrons flowing around the loop to signify an “ON” bit. This means that the input bit (and write signal) must be tranformed into four bits in quick succession. My method for this is duplicate the signal twice: first by splitting the signal, delay one copy by 11 ticks, and joining them. Second, we split the signal, delay by 22 ticks, and join them.
The more difficult operation is the delay by 11 ticks. I decided to create a special component to accomplish this operation.
The second doubling is simpler: we split the wire and merge it using our typical components. This can be done in a 2×2 tile space, which there is probably little benefit in compressing. Once each of the bits (and write signal) have been amplified 4X, they are used to load the loops. The “ON” loops will emit a pulse every 11 ticks, with the “OFF” loops emitting nothing.
“The read signal is very dangerous and can attack at any time, so we must deal with it.”
Reading from the device requires two things to happen. First, the read signal travels across every output, using the same AND/Crossing multi-tile to select one electron from each loop. These electrons are then merged into a serial data stream, using OR gates separated by delay wire. The output electrons are in the same order as they were input.
A constructed synchronizer
Below is a synchronizer I constructed in VarLife. The serial-to-parallel converter is in columns 1 through 3. Column 5 has the 11-tick doubler, and columns 6 and 7 have the 22-tick doubler. Column 8 contains allows the write signal to reach each loop, with the loops themselves in columns 9 and 10. In columns 11 and 12, the read signal selects one bit from each loop. Finally, column 13 has the parallel-to-serial converter.
The Synchronizer in Practice
The synchronizer will most like be used in several places in the computer:
- Storing one input for the ALU (arithmetic and logic unit) while the other is being retrieved from memory.
- Storing the RAM address so that it can be synchronized with the larger data loops (probably 32 bits with 22-tick separation) of the RAM, so that data can be read and written at the correct starting and stopping points.
- Pipelining/queuing: Putting one piece of data on hold while another one is using the device (probably ROM, maybe also RAM).
Each of these scenarios will require additional circuitry, which may include some or all of the following:
- Something to keep track of whether or not new data is present. By this, I mean whether or not data has been written since the last read.
- In the case of RAM, something that blocks the read signal if new data isn’t present and waits until the next loop cycle to try again.
- In pipelining/queuing, something that marks the device as being “in use” or not. If the device is not in use, the write signal also serves as the read signal, so that the data may pass through (almost) immediately. If the device is in use, it waits until the other data exits the device, and uses that data’s clock signal as the read signal.