Saturday, May 28, 2011

The State Machine

So we have the instruction set, the majority of the ALU stuff. So how to make it tick?

This took me a while to understand, but now it is clear, and it is elegant!

At any given time the CPU is doing something, including nothing.

What the CPU is doing depends upon the current values of the CPU Control Signals.

The current values of the CPU Control Signals and the next State depend upon the current values of the Opcode, the Condition Codes (Flags), and the current State (a number).

The State Register contains the current state. The next state is latched into it on the rising edge of the clock cycle.

The Instruction Decoder takes the Opcode, Condition Codes, and current State values and decodes them into CPU Control Signals and the value of the next state. Think of it like the inputs are the address and the outputs are the data stored at the address. In fact that is a quite good way of thinking about it as we can implement the decoder in a ROM. The stuff stored in the ROM is called Microcode. Which makes this machine a CISC machine. You could implement the decoder in logic gates, and/or diodes, but a ROM is simpler and has the advantage of being software programmable.

For the selected instruction set we will need 10 bits for the Opcode, and for the Processor Status Word Condition Codes 4 bits, and for the State number, humm ...

In all of the homemade CPU's I've looked at on the net it seems like 16 states is enough, however I'm kinda tempted to allow for more in case I want to add multiplication and/or CORDIC in the future (I don't know how many states they will need yet). So for now I will allocate 5 bits for the state.

That gives us 10 + 4 + 5 = 19 bits. So we'll need a 512k ROM.

Each ROM will allow for 8 CPU Control Signals. If more than 8 Control Signals are required then the addresses can be paralleled with multiple ROMs.

Although out of sequence I'd just like to record that having a ROM decode was the real reason to go for the PDP-11 opcodes. You don't have to have a direct one-to-one matching between the opcodes and the control signals. Feed in anything you like, and get out what you require. It is software that does the mapping.

Also it has just popped into my head that a ROM could be used to store constants like +1, -1, AND masks, CORDIC parameters etc. Just mention that in case I forget about it.

The trouble with bytes

I've tried to convince myself that word addressing is best; 64k x 16bits. But you can't get away from the fact that we need data in byte size pieces sometimes. ASCII strings as an example. So, I'm changing my mind yet again (I think that this will be a common feature of this BLOG), the addressing of the machine will be byte aligned, however instructions must be even byte address aligned. Strangely enough, that is how the PDP-11 did it.

So when implementing memory:

1) Ignore bit 0 when addressing memory
2) Organize memory as words
3) Bit 15 indicates byte(1) or word(0) in an opcode
4) When byte only the LSB of register is used
5) Use bit 0 to decide if LSB or MSB of data bus to be used
6) Always fill MSB of register with MS-bit of LSB when loading bytes
7) When writing bytes to memory use bit 0 to determine which byte to latch

I think that should work.

Addressing Modes

In some opcodes there are source and destination fields (SS/DD). These fields are 6 bits wide. The bottom 3 bits (0-2) indicate the register, the top 3 bits (3-5) indicate the addressing mode.

The addressing modes are as follows (all lifted from the PDP-11):

0 - Register contains operand.

1 - Register contains address of operand.

2 - Register is used as a pointer to operand, then the contents of the register are incremented (by 2 for word, 1 for byte).

3 - Register is used as a pointer to the address of the operand, then the contents of the register are incremented by 2 (always).

4 - The register is decremented (by 2 for word, 1 for byte), and then the register is used as a pointer to the operand.

5 - The register is decremented by 2 (always), and then the register is used as a pointer to the address of the operand.

6 - The Index value (stored at PC+2) is added to the address in the register. The sum points to the operand.

7 - The Index value (stored at PC+2) is added to the address pointed to by the register. The sum points to the operand.

I was slightly confused by the lack of immediate loads of addresses and constants etc. However, you don't need special case opcodes as the Program Counter is a general register. After each instruction is decoded the PC is incremented. So the PC points to the memory that contains the immediate data.

So for operations using the Program Pointer (R7):

Mode 2 - Immediate - Operand N follows instruction

Mode 3 - Absolute - Address of operand follows instruction

Mode 6 - Relative - PC + 4 + X is address of operand

Mode 7 - Relative Indirect - PC + 4 + X is address of address of operand

You will note that Mode 5 is the implementation mechanism for the Stack. I know I wasn't going to have a stack but hell why not.

Reviewing the situation

I've been reading the PDP-11 manual mentioned earlier. It is so cool, I can see why the PDP-11 was in production for 20+ years.

Anyway, I'm not going to build a PDP-11, just a machine that internally is very similar.

Registers:

R0
R1
R2
R3
R4
R5
R6 (Stack Pointer)
R7 (Program Counter)

Instructions (the numeric values of the opcodes below are in octal):

00 00 00 HALT
00 00 05 RESET
00 01 DD JMP
00 02 0R RTS
00 02 40 NOP
00 03 DD SWAB
00 04 XXX BR
00 10 XXX BNE
00 14 XXX BEQ
00 20 XXX BGE
00 24 XXX BLT
00 30 XXX BGT
00 34 XXX BLE
00 4R DD JSR
00 50 DD CLR
00 51 DD COM
00 52 DD INC
00 53 DD DEC
00 54 DD NEG
00 55 DD ADC
00 56 DD SBC
00 57 DD TST
00 60 DD ROR
00 61 DD ROL
00 62 DD ASR
00 63 DD ASL
00 67 DD SXT
01 SS DD MOV
02 SS DD CMP
03 SS DD BIT
04 SS DD BIC
05 SS DD BIS
06 SS DD ADD
07 4R DD XOR
07 7R NN SOB
10 00 XXX BPL
10 04 XXX BMI
10 10 XXX BHI
10 14 XXX BLOS
10 20 XXX BVC
10 24 XXX BVS
10 30 XXX BCC, BHIS
10 34 XXX BCS, BLO
10 50 DD CLRB
10 51 DD COMB
10 52 DD INCB
10 53 DD DECB
10 54 DD NEGB
10 55 DD ADCB
10 56 DD SBCB
10 57 DD TSTB
10 60 DD RORB
10 61 DD ROLB
10 62 DD ASRB
10 63 DD ASLB
11 SS DD MOVB
12 SS DD CMPB
13 SS DD BITB
14 SS DD BICB
15 SS DD BISB
16 SS DD SUB

In the above:

DD means destination field (6 bits)
SS means source field (6 bits)
XXX means an offset (8 bits), +127, -128
N means a number (3 bits)
NN means a number (6 bits)
R means general register (3 bits), 0, 7

I've decided for the time being to not implement the PDP-11 traps, interrupts, and modes (user/kernel).

In an earlier post I had decided to have separate memory and i/o spaces sharing the same address bus. I have now changed my mind, we will have all i/o mapped into the memory space as per the PDP-11.

The External Flag (used for the 20ms IPS pseudo-interrupt) will now be memory mapped instead of being a bit in the Processor Status Word (PSW) aka Flags Register. The PSW is rather empty:

Bit 0 - C (Carry)
Bit 1 - V (Overflow)
Bit 2 - Z (Zero)
Bit 3 - N (Negative)
Bits 4-15 Reserved

PDP-11 Manual

Yesterday I found a scan of the PDP-11/40 Processor Handbook, and printed it out. It is amazing how close my ideas are compared to the 1972 real PDP-11 machine. Now I'm really tempted to try and make a PDP-11, but I have to tell myself to learn to walk first. Remember that DEC started with the PDP-1 first before they got to the PDP-11!

Two things jump out; the PDP-11 had memory mapped registers, and any memory location could be used as an accumulator (albeit indirectly). I'm going to go through the manual, and pick out the cool bits. I guess create the PPDP-1 (Paul's PDP-1).

Bit Slice

A 16-bit computer has a lot of connections in it. Making up schematics of everything at once is a challenge, as will testing the finished machine be. As this is my first attempt at building a computer I will try and make things as simple as possible. I found a couple of references to bit-slice based machines in the literature. Bit-slice machines are basically a collection of 1-bit machines glued together. So if you want a 16-bit machine you build 16 1-bit machines and connect them together. I spent a couple of lunchtimes this week looking at the idea, and have come to the conclusion that it may be the way ahead for this my first computer.

The key reason is that you reduce the complexity not the functionality, also testing a 1-bit board will be easier than a 16-bit board.

So having decided on going bit-slice, I've had a first stab at putting some schematics together.

This is the Arithmetic Logic Unit (1-bit). It should be noted that this is the generic bit 1 to 14 version. Bit 0 and bit 15 versions will be slightly different due to setup and propagation of carry bits etc.
I've added shifts into the instruction set as it was a no brainer once I drew the circuit, as it is not logic but simple wiring. I've also added SET and RESET, i.e. set to zero and one, again because it was easy. The above ALU implements AND, OR, NOT, XOR, ADD, ADC, SUB, SBC, SET, RESET, SWAB (swap high and low bytes of word), LOGICAL SHIFT LEFT, LOGICAL SHIFT RIGHT, ARITHMETIC SHIFT RIGHT, ARITHMETIC SHIFT LEFT, ROLL LEFT, ROLL RIGHT.

Below is the register implementation:

Sunday, May 15, 2011

Flag Register

Had a quick flick through the AM1601 Programming Manual, and I noticed a really cool instruction - FLAG. Basically FLAG does the normal flag stuff, but more importantly does the high level greater than, greater than or equal etc. tests as well. As we have already decided upon a 16-bit Flag Register, which is currently very empty, why not add in the AM1601 FLAG technology? Why not indeed, that is what we will do.

Opcodes

Not a lot of work done today on the project, due to Paul Ayres popping over last night and making us drink lots of alcohol. Anyway, this afternoon I did some thinking whilst sitting on the bench by the herb garden.

The decision to go for a 16-bit word size allows a lot of space when designing the opcodes, which in turn should make life less complicated when designing the decode circuits.

I have to say that I'm finding it difficult to put out of my mind the actual building of the machine, but I will keep telling myself that you can't order any parts until the design is 100% complete. I still have lots of bits left over from the "Build an Electric Guitar from scratch" project, so I'm trying to be good this time.

Anyway, we have 8-bits of MSB for the instruction root opcode, and 8-bits of LSB for the instruction variant opcode. When I say root I mean the general class of instruction, e.g. ADD, ADC and INC are the same instruction (same root) just with slight variants.

I think that I'll be totally random and assign the MSB of the opcode on the basis of the alphabetical order of the mnemonics of the root opcodes. Hell, it is nice to bolt down some part of the design. Already changed my mind, I won't do that exactly.

So let us group the roots!

ADD -->

ADC
ADD
INC

LOGICAL -->

AND
OR
XOR
CPL
NEG
XB

SUBTRACT -->

CP
DEC
SBC
SUB

DEBUG -->

HALT
NOP

MEMORY/I-O -->

IN
LD
OUT

BRANCH -->

JP CC
JP NNNN

Looking at the above groupings make me wonder why we have separate IN and OUT instructions, after all LD is used for both getting and storing items from memory. Why not just have IO as a single instruction? Like the idea and that is what we will do.

So that group becomes:

MEMORY/I-O -->

IO
LD

The "Logical" group really is a group of roots, i.e. they are not related, other than by being logical. For now will keep them together, but I don't rule out splitting them apart in the future.

Saturday, May 14, 2011

A new register required

Having decided to include Register Indirect Addressing, we need to add another register. I know this is adding complexity to the design, but without it the Address Register would be wiped with every store/load immediate instruction.

So, the Address Register will become an internal only register like the Data Register, and a new Memory Pointer Register will be added to perform the Register Indirect Addressing function.

Addressing Modes

In an earlier post I identified the basic core instruction set. What was not defined was the suite of operands for the instruction set. We'll take a stab at that now.

We need to be able to load and/or operate on a literal value; so we need to have Immediate Addressing. In this mode of addressing the word following the opcode in memory contains the actual operand.

We need the Address Register to know where to store things. It would be nice to be able to manipulate the value stored therein. So let us allow Register Addressing; allowing operations between the Accumulator, the Address Register, and memory.

While I'm at it, given that we need the Address Register, let us add Register Indirect Addressing. This type of addressing allows the Address Register to be used as a pointer to any location in memory (or i/o space). This could be a very powerful addressing mode.

So we have:

LD A, NNNN <- Load Accumulator with literal NNNN

LD A, M <- Load Accumulator with contents of Address Register

LD A, [M] <- Load Accumulator with contents of memory at address pointed to by Address Register

I think that, as is possible, that all permutations should be allowed for all instructions.

By word or byte?

The 64k address limit has been determined. However, do I implement memory as 64k x byte or 64k x word?

Having 64k x word would speed up the system significantly.

But we would need to add some instruction(s) to allow for byte extraction/insertion.

IPS uses the regular 7-bit ASCII character set (and uses bit 7 for inverse video indication), so bytes not words are needed for keyboard entry and video display (and other hardware interactions).

I remember that Lyle and I ran into the same issue on the AM1601 which was designed to use words not bytes.

Looking up in the AM1601 Programming Manual I saw that we introduced the XB instruction for this problem. XB swaps the high and low bytes. XB will be added to the opcode list.

So, with the addition of the XB opcode, we are cool to go forward with 64k x word; giving us 128k of memory, even more luxury!

The I/O space shares the same address bus as memory, but it is a separate space. I will define the I/O space as 64k x byte instead of 64k x word. Any writes to the I/O space will ignore the MSB of the applicable value.

Background Material

I found this website very useful, I hope you will too:

http://cpuville.com/

Specifications

Well now, what is needed and what is nice to have?

A fair question which has been troubling by mind these past few days.

When I was at college I was a big fan of the DEC PDP-11, mainly because it was hidden in a room that not many people seemed to know about. Consequently it ran my stuff really quickly compared to the Honeywell time sharing system that everyone else used.

Anyway, memory lane aside, the task is to design and build a 1970s mini-computer from scratch.

Looking up on the web it seems like the PDP-11 moved to using microprocessors around 1977/8, prior to that it had been using MSI chips. So I think that specification wise I'll go for an early 1970s PDP-11, but not a PDP-11. The word size will however be 16 bits.

Early PDP-11s only had about 8k of memory. That seems a little small so let us have lots of memory, ... 64k. The address bus width will therefore be 16 bits as well. This will make address calculation easier in the long run (I hope). Extra memory can be added if needed in the future by paging, or increasing the address bus width. Remember UNIX ran on the PDP-11s predecessor with far less memory, so 64k is luxury.

I/O ports will be memory mapped (i.e. use the same address bus as memory), however it will be a different space than for RAM and ROM, effectively paged.

In summary data bus width 16-bits, address bus width 16 bits; the buses are shared by the memory (RAM/ROM) and the i/o port spaces.

Registers:

Accumulator - 16 bit - YES
Program Counter - 16 bit - YES
Stack Pointer - 16 bit - NO
Address Register - 16 bit - YES (like HL register in the 8080/Z80) (maybe internal to CPU only)
Instruction Register - 16 bit (internal to CPU only)
Data Register - 16 bit (internal to CPU only)
Refresh Register - 16 bit - NO (not yet)
Flag Register - 16 bit

The reason I've decided not to have a Stack Pointer, and hence the ability to implement subroutine calls and interrupt handlers, is that the IPS operating system doesn't use them and so adding the feature just adds complexity, and that is what we do not want. IPS is a stack based language, but it doesn't need a hardware stack implementation.

The Refresh Register is not required as I've decided to use static RAM only, instead of dynamic RAM. Static RAM is so cheap these days and does not need refreshing like dynamic RAM (back in the 1970s it was very expensive and had slow access times).

The Flag Register is set at 16-bits, although far fewer bits are actually required. It is 16-bit just to keep the design flat, i.e. the same across all registers, and to leave room for extension in the future.

The Flag bits decided upon so far are:

S - Sign (MSB of result)
Z - Zero (1 if result zero, 0 otherwise)
C - Carry (1 if the operation produced a carry from the MSB of the operand or result)
E - External Flag (used for the IPS 20ms pseudo-interrupt)

Opcodes:

So what instructions shall I implement?

The key driver here is what do we need to implement IPS. If it is not necessary for IPS we won't include it (yet).

This is the first cut. What I've done is gone through my old copy of the Z80 programming Manual and picked out the "need to have" instructions. I'm sure that some will be added and/or dropped in the future, but we have to start somewhere. I should note that I haven't decided upon operands yet, only that one of them will be the Accumulator.

ADC - Add with carry
ADD - Add without carry
AND - Logical AND
CP - Compare (maybe)
CPL - Inverted 1's complement (maybe)
DEC - Decrement by 1
HALT - Stops the CPU (maybe)
IN - Read from I/O port
INC - Increment by 1
JP CC - Jump to address conditional
JP nnnn Jump to address
LD - Load (and store)
NEG - Negated 2's complement (maybe)
NOP - No operation
OR - Logical OR
OUT - Output to I/O port
SBC - Subtract with carry (borrow)
SUB - Subtract without carry (borrow)
XOR - Logical exclusive OR

There are quite a few "maybe" instructions above, time will tell how things turn out. Notable omissions are the calls/returns, and the rotate/shift instructions.

Start of the project

For many years now I've been thinking about building an old 1970s style computer, but I've never got around to doing it. Recently I came across a number of sites where others have gone a step further, by building computers not from the old microprocessors but from 74xx MSI chips.

This has got me a thinking, it is one thing to bolt a few VSLI components together according to manufacturers data sheets, and it is another to create your own computer from the raw logic gates (or transistors if you have enough space and time).

So this project, which I fear will be very long, is all about designing and maybe building a reasonable 1970s style mini computer from the raw logic gates.

The nice thing about rolling your own is that you can choose what your machine will be like. This is the tough part, as you have to decide between nice to have and need to have. For this first shot I'm going to try and limit myself to need to have.

I've already decided upon the operating system, it will be IPS. In another life I fly spacecraft (AMSAT), and IPS is the operating system we use on the Phase 3 and Phase 5 spacecraft. It has a small footprint (ideal for 1970s computers), and needs only a small amount of work to customize for different processors.

For more info on IPS look here:

http://www.amsat.org/amsat/projects/ips/index.html

Obviously as I am designing the processor from scratch and know the target operating system, the new design can be optimized from the start to match.

Back in 2002 Lyle Johnson and I designed a processor called the AM1601, which was a candidate to be the processor for the AMSAT IHU-3 (Integrated Housekeeping Unit). At that time AMSAT had been using 1802 chips from the 1970s, and radiation harden ones had become harder and harder to find. The AM1601 was a software implementation, designed to be burnt onto FPLAs. The idea being that we could keep the design fixed, and burn to the latest FPLA when required. Anyway, it didn't happen, we decided in the end to buy a huge batch of ARM chips instead, yes we were hooked on ARM before it took over the world.

We did, well Stacey Mills did, a lot of radiation testing on the ARM and it is incredibly radiation tolerant straight out of the box, which is a good thing when you're using it in orbit. Anyway that is by the side, the point is that my new processor will not be an AM1601, as reviewing the user manual we wrote in 2002 it seems like it was too complicated, and as I said above I'm only going to implement what is needed not the nice to have. That said, there are elements of the AM1601 design which will filter into the new processor, the AM1601 design process was also very educational.

Details of the abortive AM1601 project can be found here:

http://www.amsat.org/amsat/projects/ips/Am1601.html

So now I'm going to sit down and decide what I need to have in the processor. One interesting thought is that by building from raw gates you do not have to split between the processor and the rest of the system, ... they can be one and the same. I'll stop on that one 'cos it is a bit of a mind shift.