HOME JOURNALS CONTACT

Information Technology Journal

Year: 2007 | Volume: 6 | Issue: 4 | Page No.: 497-508
DOI: 10.3923/itj.2007.497.508
A Systematic Approach for Building Processors in a Computer Design Lab Course at Universities in Developing Countries
Raed A. Alqadi and Luai M. Malhis

Abstract: This study presents a systematic technique for building processors by using off-the-shelf components. The main objective of this methodology is to introduce computer engineering, electrical engineering and computer science students, in developing countries, to all phases of CPU design using very primitive ICs and home made tool kits. Using this technique, students enrolled in processor design lab can design and build processors for a defined instruction set with readily available components at minimal cost. The proposed methodology has been implemented in the computer engineering department at An-Najah National University and has proven to be efficient in teaching computer architecture and processor design, as well as boosting computer-engineering students’ self-confidence without requiring them to use very advanced laboratory equipment. Nevertheless, the sole purpose of this technique and the objective of building this microprocessor is pure educational and is not to introduce a new methodology for building microprocessors for commercial purposes. In this study, we will present our methodology by giving an instruction set example, then describing the design and implementation steps followed. In addition, we also discuss the primitive components used to build the datapath, the controller and the software tools used in both the implementation and testing phases. We will show that the proposed methodology is very effective in providing students with the experience of microprocessor design without the need for advanced and expensive kits and devices.

Fulltext PDF Fulltext HTML

How to cite this article
Raed A. Alqadi and Luai M. Malhis, 2007. A Systematic Approach for Building Processors in a Computer Design Lab Course at Universities in Developing Countries. Information Technology Journal, 6: 497-508.

Keywords: computer engineering labs, Processor design lab course, datapath, computer architecture, effective education and controller

INTRODUCTION

Most universities offering degrees in computer engineering, electrical engineering and computer science require students on the average to complete at least two courses in computer organization and architecture. Also, an additional course in microprocessors and/or microcontrollers is almost always required. Universities in American and European countries usually have design labs for building processors in order to enhance the computer architecture, organization and microprocessor courses. In such labs, students design and build complete educational CPUs. However, generally, in universities of developing countries, the attention, given to computer design labs is minimal. Students of these universities lack the hands on experience in designing and implementing a complete processor. The reason is the lack of necessary hardware and software resources to aid students in all phases of processor design. In this study, we will overcome these limitations by developing a methodology and related software tools by which students can develop their own design tools and can use off-the-shelf components to implement their processors. This methodology discusses all design and implementation phases of a processor and includes: defining an instruction set, laying out a data-path and the memory interface to execute this instruction set and implementing control with appropriate timings for control signals. We believe that developing in-house hardware and software tools is an effective method in teaching computer-hardware related courses. This philosophy has been used in many universities in developed countries, for example (Hersch, 1994), (Ozcan, 1996) and (Pastor et al., 2004). More papers discussing methods to improve teaching processor design courses are found in. (Gang, 2003), (Gottlib and Carter, 2003), (Nicoud, 1991) and (Nicoud and Sommer, 1975).

The design technique section contains an overview of the design philosophy Section instruction set design describes the instruction set that will be used in the design example. Our instruction set example is based on microcontroller architecture such as the Intel and Microchip microcontrollers. The datapath section focuses on the datapath layout, with emphasis on custom ALU design and implantation. Off-the-shelf EPROMS are used to implement the ALU. The controller design and implementation are discussed next along with all control signals and their timings. An assembler for our processor is described next. The last section introduces a hardware kit that we developed to interface the processor with a Personal Computer (PC). This Kit is accompanied by software that enables downloading programs to the processor as well as tools for debugging the hardware of the processor. Final thoughts and concluding remarks are then presented.

DESIGN TECHNIQUE

Due to the current situation in Palestine, where An-najah N. University is located, we face many challenges to obtain complex ICs such as FPGAs, CPLDs, GALs, therefore, we have to use easy-to-get components such as common RAMs, EPROMs/EEPROMs and MSI ICs. Hence, what we are presenting is a design of a CPU that mainly uses EPROMs for implementing Arithmetic and Logic Units (ALU) and Control Units (CU). We use in-house software programs discussed later to produce files downloadable to EPROMs to generate the required functionality. These programs are developed by the authors. Moreover, the assembler for this processor is also developed by the authors and also described in a later section.

The technique presented in this study has been applied to teaching a processor design lab course at An-Najah National University. This course is part of the curriculum to obtain a B.S. degree in computer engineering. In this course, which is usually taken as a fourth year level, students are asked to design and build a CPU with given specifications.

Figure 1 illustrates the main steps in the design procedure; some of the steps have to be done manually while others are partially or fully automated. Step 1 is mainly a manual step which involves selecting the instruction set and designing an appropriate instruction set format. Step 2 is the datapath design which involves designing the ALU, registers, internal buses and memory interface. This step has been decomposed into three sub steps; step 2.1 is fully automated while steps 2.2 and 2.3 are partially automated. Step 3 is the controller design step; for simplicity it has been decomposed into three sub-steps. An iteration process of steps 3.1 and 3.2 may be needed. The last step is the testing step in which an interface circuitry and a software tool have been built to facilitate this process.

INSTRUCTION SET DESIGN

Since our objective is mainly educational, we have chosen a design example for an 8-bit processor architecture suitable for micro-controllers. Nevertheless, this methodology can be extended to the design of general purpose processors. Recent trends in computer architecture uses reduced instruction set design philosophy (RISC), therefore, we decided to use RISC as the base of our microprocessor design. RISC design is dominating because of its simplicity and efficiency in hardware design. Despite the fact that the proposed architecture is RISC architecture, it has many common features with accumulator-based like machines. Such architecture is very popular in microcontroller products such as Microchip (http://www.microchip.com) and Intel 80x51 (http://www.intel.com). The instruction format for the design example used throughout this study is shown in Fig. 2.

As shown in this Fig. 2, a single instruction format is used; therefore, all instructions will follow the same format as explained in Fig. 2.

The size of instruction format is 16 bits divided into the following parts:

Opcode: Operation code, size is 5 bits; hence up to 32 instructions can be supported.

Fig. 1: Steps of processor design

Fig. 2: Instruction format

Table 1: Instruction set

D: Destination, size is 1 bit. If D = 0 result will be stored in Accumulator A, else if D = 1 destination is the register specified in the register field.

Register/Literal: Size is 8 bits. Register part is interpreted as register address. This provides 256 data RAM locations (Registers) which is sufficient for most microcontroller applications. This field is also used to hold an 8-bit immediate value.

Ex: Extension size 2 bits. Used only in jump or call instructions to select the program counter value. This provides 10-bit jump address within the program memory. The instructions supported by our proposed microprocessor are summarized in Table 1.

DATAPATH

Figure 3 shows the datapath used in our design example; note that we have chosen the Harvard style architecture for memory interface because of its simplicity and efficiency. The main components of the data path are: The ALU, the Program Counter (PC), the Instruction Register (IR), the Accumulator (A), the Memory Data Register (MDR), the data RAM (Data registers) and the multiplexers. In the following subsections, we will describe these components. For simplicity, Fig. 3 does not show the components required to implement the CALL instruction and the bit manipulating instructions.

Registers, program counter and multiplexers: The A, MDR, IR registers are implemented with common 8-bit registers such as 74LS374 resisters, special attention has to be given to the triggering edge when the timing analysis is performed to write at the correct edge and to avoid glitches. Since IR is 16 bit registers, two 8-bit registers would be required.

In our design, we have restricted the program memory to 1K word (2 KB), such restriction is sufficient in microcontroller applications. Hence, a 10 bit program counter is required. A suitable counter must support parallel load in order to implement the conditional jump instructions. A popular counter that supports these features in addition to a clear feature (required for reset) is 74LS193. Also the counter has cascade feature, therefore, 3 ICs can be cascaded to implement the PC. The block diagram is shown in Fig. 4.

The multiplexers can be implemented using multiplexer ICs or by using Tri-state buffers such as 74LS244 ICs. The data registers (256-registers) can be implemented by using popular RAM ICs of size greater or equal to 256 such as 62xxx or 61xxx, for example a 6164 RAM IC can be used where only the first 256 bytes are used. In this case the first 8 bits of the address bus will be used and the rest of the address bus lines will be grounded. An improvement to this implementation will be to use part of these registers as Special Function Registers (SFRs) to implement the input and output ports. For example, the first 128 registers can be implemented using a RAM the upper 128 bytes can be dedicated to SFRs, in such case the SFRs will be implemented using individual registers or specialized ICs. For example an ADC can be used to add an analog input port.

From the datapath shown in Fig. 3, the control signals can be identified. These signals are illustrated in the Figure and divided into: ALU Control, ALU output enable, ALU Mux selection, RAM address Mux selection, PC load and PC increment signals; and write signals for the registers A, MDR, IR and the flags.

Fig. 3: The datapath

Fig. 4: Program counter register

These signals will be generated by the control unit described in section 5.

ALU design: Of course it is possible to build the ALU by using components such as 74181, but we believe it is better to let the student design custom ALUs from components such as GALs, PALs, or EPROMS. Here, we will present our technique for building 8-bit ALUs using our software tool that generates EPROM programming files. By using this method we can readily build custom ALUs from popular EPROMS such as 2764, 27128, 27256 and 27512. Figure 5 shows an 8-bit ALU built by using two 27512 EPROMs. If smaller EPROMs are available, the code can be easily modified to use the smaller ICs, but more stages will be used in this case. Note that the address bus is used for inputs and data bus is used for outputs.

The ALU must perform a set of operations required for executing instructions in the instruction set. The following operations are sufficient for the instructions defined in our instruction set. Let A and B be the inputs and F be the output.

1. F = NOT A. One’s complement of A.
2. F = A AND B.
3. F = A OR B.
4. F = A XOR B.
5. F = A+B.
6. F = A-B.
7. F = A+1.
8. F = A-1.
9. F = A.
10. F = B.
11. F = A ROL 1 through carry.
12. F = A ROR 1 through carry

The following algorithm is used to generate the program files for the two 27512 EPROMS. This algorithm is designed for a 2-stage ALU but can easily be modified for more ALUs with more stages.

Fig. 5: Block diagram of the two-stage ALU


Table 2: Control signals to control the datapath

Fig. 6: Finite state machine

THE CONTROL UNIT

This part is the most difficult and time consuming part to design and implement. Traditionally this part is implemented as hardwired control or microcode control. Because of the simplicity of the hardwired design and for education purposes we choose to implement the control unit using the hardwired control methodology. This unit is abstracted as a Finite State machine shown in Fig. 6. The control unit is responsible for asserting and un-asserting all control signals necessary for the datapath to function properly. In our approach, we based our design on using the most primitive (i.e., off-the-shelf) components that are available to the students at affordable prices. The control signals of our datapath components are described in more details in Table 2.

First we must categories the instructions in the instruction set into groups such that all related instructions are placed in one group (Table 3). All instructions in a given group require the same set of states in the state diagram to be executed. Also the same control signals are asserted and un-asserted for all instructions in the group. This makes the design much easier to manage and control signal values are determined on state bases rather than instruction bases. For simplicity, the bit operation instructions and the call and return instructions implementation will not be described in this paper and usually they are left for the students for extra credit.

A state diagram that executes the group of instructions in C3 is shown in Fig. 7. Note that it is possible to execute some instructions in fewer states and also it is possible to reduce the number of states, but the states shown simplify the design of the control unit. The operations in each state and the asserted signals are shown in Table 4.

Although there are many possible implementations, we have chosen the states in Fig. 7 to facilitate the control unit implementation. In fact, this technique will still work for different state diagrams that implements the instruction set. A simple implementation will use all input signal including the current state, Z, C, D (direction) and the opcode as inputs to the controller the output is the control signal values. It is also possible to simplify the control unit further by breaking it into two units as shown in Fig. 8. Unit 1 generates the control signal independent of the states and unit 2 generates signals that are dependent on the state. Since our instruction has been carefully chosen to simplify the design in general, the ALU_OP, ALU_MUX signals can be generated by a separate logic unit (ROM, PLA, or PAL). Also, notice that the states are sequential except when exiting state 2, hence, the opcode is actually needed to determine the state following state 1. Thus, only three bits will be needed as input to the second unit because there are 7 possible branches after S1.

The ALU_OP is easily figured from the instruction, for example ADDAR indicates that the operation is ADD. Also the ALU_MUX should select Literal (L) for if the operand is Literal and MDR if the operand is Register (R). Decoding instructions and determining next state after state 1 (decode state) is determined based on the opcode, the destination field (D), Zero flag, (Z) and the Carry Flag (C).

Fig. 7: The state transition diagram

Table 3: Categorizing instructions into groups

The result of the decode stage is the next state and the selection of second ALU operand either Register or Literal (Table 5).

Timing consideration: Before implementing the control, the correct timing diagram for control signals asserted for a group of instructions must be determined. This process is necessary to ensure that there are no glitches and the data is written at the correct triggering edges to guarantee the setup and hold times for the registers and memory. Also, attention should be given to edge triggered and level triggered components, for example the RAM which contains the register file is level triggered (active low signal). Also, since the ALU bus and the RAM bus are shared, the ALU output and the RAM read signal cannot be enabled together. The states S8 and S10 have been added to guarantee glitch free design.

To guarantee these timing requirements, two approaches are possible: in the first one, all register clock signals (write) are generated by the controller, this is an easy approach and but it requires extra states. In the second approach, the state register is triggered by clock negative edge and the data is written to registers A, MDR…etc at the clock’s positive edge. The register clock signals (A_WR_CLK, MDR_WR_CLK, IR_WR_CLK, FLAG_WR_CLK) are generated by anding the corresponding write enable signals with the clock. Note that this will generate a glitch free write clocks, however, the opposite will not. In this study, we have used the second approach for generating the clock signals for A, MDR, IR and the flags. However, we used the first approach in states 8, 9 and 10.

This timing step may result in modifying the state design step and an iterative process is necessary to refine the design. To illustrate the timing diagram, we will present the timing diagram for Group A as shown in Fig. 9. Similar timing diagrams must be determined for other groups but are not discussed here and can be deduced from Fig. 9.

Table 4: Control signals asserted in each state
L: Literal, IR[0] = L[0] = first bit in literal. IR[0--9] = address

Fig. 8: Block diagram of the control unit

Table 5: Control signals
D = A means destination is A, hence D = 0 and D = R means destination = Register (D = 1)

Generating the control hardware: Based on the above explanation, we will present an algorithm for generating a controller implemented from EROMs/EEPROMs. Although the algorithms presented are written for the state diagram presented above, they are easily modified for any state diagram.


Fig. 9: Timing diagram for group A instructions

PROGRAMMING THE PROCESSOR

Since we have chosen Harvard style memory architecture as shown in Fig. 3, the program has to be stored in the program memory that can be implemented using either RAM or EPROM/EEPROM. However, in order to use RAM, an interface circuitry would be necessary and will be described shortly.

If an EPROM/EEPROM is to be used, the students can write the program in machine or in assembly language. However, writing in machine language requires editing the EPROM hex/binary files; which is acceptable because of the simplicity of the instruction set. Of course, writing in assembly language requires that an Assembler be available. Actually-a note worth mentioning here-is that writing the assembler can also be assigned to the students which would be a good educational project.

In order to facilitate programming our CPU, we have developed an Assembler that generates files with binary and hex format that can be downloaded to either EPROM or RAM.

The assembler and disassembler: Our assembler is a two-pass assembling process. In the first pass the assembler finds the labels and stores them in a symbol table and also checks for syntax errors. In the second pass, the assembler translates the assembly code into machine code.

In order to simplify witting the assembler, the following rules has been used: (1) one instruction per line, (2) labels start with a special character $ (3) anything that follows a semi-column is a comment.

Since the program memory is 16-bit wide, the assembler produces two binary files; one for the low byte and one for the high byte. The files can be downloaded to EPROMs by using any universal programmer. However, if a RAM is used, the files will be down loaded by the interface circuit. To facilitate debugging, disassembler has also been developed. The following code shows an example of an assembly program that adds the values in registers 20 through 30 and stores the result in A.

Sample Program:

Assembler Algorithm

INTERFACE CIRCUIT

Although building this circuit is not a necessity for building and testing the processor, its implementation significantly simplifies programming, testing and debugging the processor. There are two main purposes for the interface circuit, the first is to program the CPU by downloading programs in the program memory if a RAM is used and the second is to test and debug the whole CPU. Of course using a RAM for the program memory will simplify programming the CPU, because it takes away the burden of erasing and re-programming EPROMs that had to be removed and reinstalled in the circuit for reprogramming.

Program download circuit: Our interface circuit is a microcontroller-based circuit which interfaces serially to a PC. We used PIC16F877 (Microchip, 2005) for implementing this circuit and developed a C++ based PC program as interface program. The interface program downloads a binary/or hex file to the upper or lower bank of the 16-bit program memory. Figure 10 shows the block diagram of the circuit that is used to connect the PC to the CPU program memory.

Another possibility is to first use EPROM to test the processor and then a small Monitor program can be written and stored in EPROM. This monitor program, in addition to adding UART as an SFR to the processor can be used to download programs to RAM. Such a technique can be used without using the PIC controller. However, using a PIC controller requires almost the same amount of hardware.

Debugging and testing interface circuit: This circuit is needed in order to debug, test and examine the results of programs. The interface circuit and a corresponding PC program can read the registers in the datapath as well as the register memory to test correct execution by the processor.

Fig. 10: Connecting the PC to processor memory

Fig. 11: Debugging interface circuit

A block diagram of the interface circuit is shown in Fig. 11. Note that the register memory address interface is not shown in the Fig. 11 the interface is similar to the one shown in Fig. 10. The microcontroller circuit along with the PC program can examine the contents of the CPU registers when run with very slow frequency or under single step execution mode.

CONCLUSIONS

In this study, we have presented a methodology by which a processor design lab course can be easily conducted at universities with limited resources (i.e., universities in devolving counties). This method is based on using easily obtainable (off-the-shelf) components as the basic building block, then, on developing programs to aid in designing these components as well as loading necessary data into these components. An assembler that translates assembly instruction into the required binary format was also presented. In addition, a hardware tool that can be used to interface the implemented processor with a PC is also given. This tool is used for debugging purposes and also used to load application programs into processor memory. This methodology has shown that it is very applicable and enhances students’ knowledge on critical issues pertaining to processor design. Moreover, students also get a very good hands-on-experience in hardware design, especially CPU design. This method has been used for the past three years to conduct a computer design lab course in the computer engineering department at An-Najah national university, in Palestine. During which students have shown great interest in studying this course. We encourage and recommend integrating this course in the curriculum of all computer engineering departments in developing countries and adapting this methodology to conduct the course. Future work can focus on the implementation of call, return, set, clear and other jump related instructions.

REFERENCES

  • Gang, Q., 2003. Introducing the concept of design reuse into undergraduate digital design curriculum. Proceedings of the International Conference on Microelectronics Systems Education (MSE'03), June 1-2, 2003, Anaheim, California -.


  • Gottlib, D.B. and N.P. Carter, 2003. Microprocessor interfacing laboratory. Proceedings of the International Conference on Microelectronics Systems Education (MSE'03), June 1-2, 2003, Anaheim, California -.


  • Hersch, R.D., 1994. Integrated Theory and Practice in Microprocessor System Design Course. Euromicro, North Holland, Amsterdam, pp: 227-232


  • Nicoud, J.D. and R. Sommer, 1975. Modular logic elements, microprocessors and peripherals improve efficiency of teaching and development. Proceedings of the Compton, February 1, 1975, IEEE Computer Society Press, Los Alamitos, Calif, pp: 127-130.


  • Nicoud, J.D., 1991. Dedicated tools for microprocessor education. IEEE. Micro, 11: 62-68.


  • Ozcan, M.B., 1996. Integration of software tools in software engineering education. Proceedings of the 9th Conference on Software Engineering Education (CSEE'96), April 21-24, 1996, Daytona, FL -.


  • Pastor, J.S., I.G. Lopez, F. Gomez-Arribas and J. Martinez, 2004. A remote laboratory for debugging FPGA-based microprocessor prototypes. Proceedings of the International Conference on Advanced Learning Technologies (ICALT'02), August 30-September 1, 2004, Departamento de Indenieria Informatica, Univ. Autonoma de Madrid, Spain, pp: 86-90.

  • © Science Alert. All Rights Reserved