#### Computer Architecture Experiment

#### Lab2







#### 0、Lab instroduction

- Lab 1). Warmup Run you multiple-cycle CPU on 3E board. Try to add one new branch instruction.
- Lab 2). 5-stage pipelined CPU with 15 MIPS instructions (only required to execute in pipeline).
- Lab 3). Implementing "stall" when have hazards
- Lab 4). Implementing "forwarding paths"
- Lab 5). The whole CPU with 31 instructions.



#### Outline

- Experiment Purpose
- Experiment Task
- Basic Principle
- Operating Procedures
- Precaution



# **Experiment Purpose 1**

- Understand the principles of MC CPU Controller and datapath and master methods of MC CPU Controller and datapath design.
- Understand the principles of Datapath and master methods of Datapath design
- Understand the principles of MC CPU and master methods of MC CPU design
- master methods of program verification of CPU



# Experiment Purpose 2

- Understand the principles of Pipelined CPU
- Understand the basic units of Pipelined CPU
- Understand the working flow of 5-stages
- Master the method of simple Pipelined CPU
- master methods of program verification of simple Pipelined CPU



#### **Experiment** Task 1

- Design the CPU Controller, Datapath, bring together the basic units into Multiplecycle CPU
- Verify the MC CPU with program and observe the execution of program



## Experiment Task 2

- Design the CPU Controller, and the Datapath of 5-stages Pipelined CPU
  - -5 Stages
  - Register File
  - Memory (Instruction and Data)
  - other basic units
- Verify the Pipelined CPU with program and observe the execution of program



#### 15 common used MIPS instructions

| MIPS Instructions |        |              |        |           |        |        |                                                                         |             |
|-------------------|--------|--------------|--------|-----------|--------|--------|-------------------------------------------------------------------------|-------------|
| Bit#              | [3126] | [2521]       | [2016] | [1511]    | [1006] | [0500] | Operations                                                              |             |
| R-type            | ор     | rs           | rt     | rd        | sa     | func   |                                                                         |             |
| add               | 000000 | rs           | rt     | rd        | 00000  | 100000 | rd < rs + rt;                                                           | PC < PC + 4 |
| sub               | 000000 | rs           | rt     | rd        | 00000  | 100010 | rd < rs - rt;                                                           | PC < PC + 4 |
| and               | 000000 | rs           | rt     | rd        | 00000  | 100100 | rd <−− rs ∉ rt;                                                         | PC < PC + 4 |
| or                | 000000 | rs           | rt     | rd        | 00000  | 100101 | rd < rs i rt;                                                           | PC < PC + 4 |
| sli               | 000000 | 00000        | rt     | rd        | sa     | 000000 | rd < rt << sa;                                                          | PC < PC + 4 |
| srl               | 000000 | 00000        | rt     | rd        | sa     | 000010 | rd < rt >> sa (logical);                                                | PC < PC + 4 |
| sra               | 000000 | 00000        | rt     | rd        | sa     | 000011 | rd < rt >> sa (arithmetic);                                             | PC < PC + 4 |
| l-type            | ор     | rs           | rt     | ir        | nmedia | te     |                                                                         |             |
| addi              | 001000 | rs           | rt     | immediate |        | te     | rt < rs + (sign_extend)immediate;                                       | PC < PC + 4 |
| andi              | 001100 | rs           | rt     | immediate |        | te     | rt < rs د (zero_extend)immediate;                                       | PC < PC + 4 |
| ori               | 001101 | rs           | rt     | immediate |        | te     | rt < rs + (zero_extend)immediate;                                       | PC < PC + 4 |
| lw                | 100011 | rs           | rt     | immediate |        | te     | rt < memory[rs + (sign_extend)immediate];                               | PC < PC + 4 |
| SW                | 101011 | rs           | rt     | immediate |        | te     | memory[rs + (sign_extend)immediate] < rt;                               | PC < PC + 4 |
| beq               | 000100 | rs           | rt     | immediate |        |        | if (rs == rt) PC < PC + 4 + (sign_extend)immediate<<2; else PC < PC + 4 |             |
| bne               | 000101 | rs           | rt     | immediate |        | te     | if (rs != rt) PC < PC + 4 + (sign_extend)immediate<<2; else PC < PC + 4 |             |
| J-type            | ор     | address      |        |           |        |        |                                                                         |             |
| j                 | 000010 | 0010 address |        |           | i      |        | PC < (PC+4)[3128],address<<2                                            |             |



#### Step1: CPU Controller



**ZheJiang University** 

#### **Output of CPU Controller**

|    | Output Signal | Meaning When 1                                         | Meaning When 0       |  |
|----|---------------|--------------------------------------------------------|----------------------|--|
| 1  | PCSrc[1:0]    | 00: PC + 4;01: Branch Instr.;10: jump Instr            |                      |  |
| 2  | WritePC       | Write PC                                               | Not Write PC         |  |
| 3  | IorD          | Instruction Addr                                       | Data Addr.           |  |
| 4  | WriteMem      | Write Mem.                                             | Not Write Mem.       |  |
| 5  | Write DR      | Write Data. Reg                                        | Not Write Data. Reg  |  |
| 6  | Write IR      | Write Instr. Reg                                       | Not Write Instr. Reg |  |
| 7  | MemToReg      | From Mem. To Reg                                       | From ALUOut To Reg   |  |
| 8  | RegDest       | rd                                                     | rt                   |  |
| 9  | ALUC          | ALU Controller Op                                      |                      |  |
| 10 | ALUSrcA       | Register rs                                            | РС                   |  |
| 11 | ALUSrcB       | Selection:00:Reg rt; 01:4; 10:Imm.; 11: branch Address |                      |  |
| 12 | WriteA        | Write A Reg.                                           | Not Write A Reg.     |  |
| 13 | WriteB        | Write B Reg.                                           | Not Write B Reg.     |  |
| 14 | WriteC        | Write C Reg.                                           | Not Write C Reg.     |  |
| 15 | WriteReg      | Write Reg.                                             | Not Write Reg.       |  |



CA\_2013Spring\_Lab

#### The principle of CPU Controller(1)



Stages of Multiple-Cycle Execution of Typical MIPS CPU



#### The principle of CPU Controller(2)

J Type Instr. takes 2 Stages



Multiple Cycle CPU Stages State Machine



#### The Datapath of Multiple-cycle CPU





## Basic Units of Multiple-cycle CPU

- CPU Controller
- ALU and ALU Controller
- Register file
- Mem. (Instruction and Data together).
- others: Register, sign-extend Unit, shifter, multiplexor





#### Memory

- Dual Port Block Memory
- Port A: Read Only, Width: 32, Depth: 512
- Port B: Read and Write, Read After Write
- Rising Edge Triggered



## Multiple-cycle CPU Top Module

#### memory

x\_memory(.addra(raddr),.addrb(waddr),.clka(clk), .clkb(clk),.dinb(b\_data), .douta(mem\_data),.web(write\_mem));

- ctrl x\_ctrl(clk, rst, ir\_data, zero,write\_pc, iord, write\_mem, write\_dr, write\_ir, memtoreg, regdst, pcsource, write\_c, alu\_ctrl, alu\_srcA, alu\_srcB, write\_a, write\_b, write\_reg, state\_out, insn\_type, insn\_code, insn\_stage);
- pcm x\_pcm(clk, rst, alu\_out, c\_data, ir\_data, pcsource, write\_pc,pc);
- alu\_wrapper x\_alu\_wrapper(a\_data, b\_data, ir\_data, pc, alu\_srcA, alu\_srcB, alu\_ctrl, zero, alu\_out);
- reg\_wrapper x\_reg\_wrapper(clk, rst, ir\_data, dr\_data, c\_data, memtoreg, regdst, write\_reg, rdata\_A, rdata\_B, r6out);



#### **Observation Info**

#### Input

- West Button: Step execute
- South Button: Reset
- Slide Button: Address of Register
- Output
  - 0-7 Character of First line: Instruction Code
  - 8 of First line : Space
  - 9-10 of First line : Read Address
  - 11 of First line : Space
  - 12-13 of First line : Write Address
  - 0/2/4/6 of Second line : state/type/code/stage
  - 8-9 of Second line : PC
  - 11-14 of Second line: Selected Register Content



#### **Program for verification**

- <0> lw r1, \$20(r0); 0x8c01\_0014 State:0,1,3,5,9 Type:3 Code:1 (LD)
- <1> lw r2, \$21(r0); 0x8c02\_0015 State:0,1,3,5,9 Type:3 Code:1 (LD)
- <2> add r3, r1, r2; 0x0022\_1820 State:0,1,2,8 Type:1 Code:3 (AD)
- <3> sub r4, r1, r2; 0x0022\_2022 State:0,1,2,8 Type:1 Code:4 (SU)
- <4> and r5, r3, r4; 0x0064\_2824 State:0,1,2,8 Type:1 Code:5 (AN)
- <5> nor r6, r4, r5; 0x0085\_3027 State:0,1,2,8 Type:1 Code: 6 (NO)
- <6> sw r6, \$22(r0); 0x ac06\_0016 State:0,1,4,7 Type:3 Code: 2 (ST)
- <7> J 0; 0x0800\_0000 State:0,1 Type:2 Code:7 (JP)
- DataMem(20) = 0xbeef\_0000 ;
- DataMem(21) =0x0000\_beef ;



#### Precaution

- 1. Add Anti-Jitter
- 2. Finish the State Machine
- 3. Add Stage Status







#### Datapath of 5-stages Pipelined CPU





#### The principle of Multiple-cycle CPU





# Structural hazards —resource conflicts

- Structural hazards arise from resource conflicts when the hardware cannot support all possible combinations of instructions in simultaneous overlapped execution.
  - Memory conflicts
  - Register File conflicts
  - Other units conflicts



#### How to resolve Structural hazards





## **Register File**

#### Register File

- Positive edge for transfer data for stages
- Negative edge for write operation
- Low level for read operation





- Instruction Memory
  - Single Port Block Memory
  - Read only, Width:32
  - Falling Edge Triggered
- Data Memory
  - Single Port Block Memory
  - Read and write, Width:32
  - Falling Edge Triggered



#### The principle of Pipelined CPU-with CPU controller



|  | F | ID | EXE | MEM | WB |
|--|---|----|-----|-----|----|
|--|---|----|-----|-----|----|



#### **Output of CPU Controller**

|   | Output Signal | Meaning When 1       | Meaning When 0            |
|---|---------------|----------------------|---------------------------|
| 1 | Cu_branch     | Branch Instr.        | Non-Branch Instr.         |
| 2 | Cu_shift      | sa                   | Register data1            |
| 3 | Cu_wmem       | Write Mem.           | Not Write Mem.            |
| 4 | Cu_Mem2Reg    | From Mem. To Reg     | From ALUOut To Reg        |
| 5 | Cu_sext       | Sign-extend the imm. | No sign extended the imm. |
| 6 | Cu_aluc       | ALU Operation        |                           |
| 7 | Cu_aluimm     | Imm.                 | Register data2            |
| 8 | Cu_wreg       | Write Reg.           | Not Write Reg.            |
| 9 | Cu_regrt      | rt                   | rd                        |



#### Units of Pipelined-cycle CPU

- IF Stage (Instr. Mem.)
- ID Stage (CPU Ctl. And R.F.)
- EX Stage (ALU)
- Mem Stage (Data Mem.)
- WB Stage



#### 💫 pipelinedcpu.gdf - Graphic Editor







CA\_2013Spring\_Lab

## Pipelined CPU Top Module

- module top (input wire CCLK, BTN3, BTN2, input wire [3:0]SW, output wire LED, LCDE, LCDRS, LCDRW, output wire [3:0]LCDDAT);
- assign pc [31:0] = if\_npc[31:0];
- if\_stage x\_if\_stage(BTN3, rst, pc, mem\_pc, mem\_branch, ...
  IF\_ins\_type, IF\_ins\_number,ID\_ins\_type,ID\_ins\_number);
- id\_stage x\_id\_stage(BTN3, rst, if\_inst, if\_pc4, wb\_destR,...
  ID\_ins\_type, ID\_ins\_number, EX\_ins\_type, EX\_ins\_number..);
  - ex\_stage x\_ex\_stage(BTN3, id\_imm, id\_inA, id\_inB, id\_wreg, .. EX\_ins\_type, EX\_ins\_number, MEM\_ins\_type, MEM\_ins\_number);
    - mem\_stage x\_mem\_stage(BTN3, ex\_destR, ex\_inB, ex\_aluR, ... MEM\_ins\_type, MEM\_ins\_number, WB\_ins\_type, WB\_ins\_number);
    - wb\_stage x\_wb\_stage(BTN3, mem\_destR, mem\_aluR, ... WB\_ins\_type, WB\_ins\_number,OUT\_ins\_type, OUT\_ins\_number);

104 53 J. W

ZheJiang University

#### **Observation Info**

Input

- West Button: Step execute
- South Button: Reset
- 4 Slide Button: Register Index
- Output
  - 0-7 Character of First line: Instruction Code
  - 8 of First line : Space
  - 9-10 of First line : Clock Count
  - 11 of First line : Space
  - 12-15 of First line : Register Content
  - Second line : "stage name"/number/type
  - stage name: 1-"f", 2-"d", 3-"e", 4-"m", 5-"w"



#### **Program for verification**

|    | Instruction     | Bin Code    | Address | Inst. Type |
|----|-----------------|-------------|---------|------------|
| 1  | lw r1, \$20(r0) | 0x8c01_0014 | 0       | 6          |
| 2  | lw r6, \$21(r0) | 0x8c06_0015 | 1       | 6          |
| 3  | add r3,r0,r0    | 0x0000_1820 | 2       | 1          |
| 4  | add r4,r0,r0    | 0x0000_2020 | 3       | 1          |
| 5  | add r5,r0,r0    | 0x0000_2820 | 4       | 1          |
| 6  | add r2,r2,r1    | 0x0041_1020 | 5       | 1          |
| 7  | sub r3, r3, r1  | 0x0061_1822 | 6       | 2          |
| 8  | and r4, r4, r1  | 0x0081_2024 | 7       | 3          |
| 9  | nor r5, r5, r1  | 0x00a1_2827 | 8       | 5          |
| 10 | beq r2, r1, -8  | 0x1041_fff8 | 9       | 8          |



#### Precaution

- 1. Add Anti-Jitter and display for "A-F".
- 2. Finish the blank.
- 3. Debug method: Output whatever signal to LCD Display.
- 4. Understand the principle of pipelined CPU and check the logic of circuit carefully, understand the sample code, then write code and synthesize the project, because it takes you a few minutes...



## Something Important !!!

- 1 The number and type tells the information of the instruction that is to be executed in the stage.
- 2、How to verify the result? Pls. check the result of WB stage for Rtype and LW instructions, while check the result of EXEC stage for BEQ instruction.
- 3、Why there are some NONE instructions following BEQ? How many NONE instructions? 3, because the condition of BEQ is generated in MEM stage.
- 4、Why the initial value of PC is FFFFFFFF, not 0?
- 5、Why we should pull the slide button after step execution to refresh the result? And instruction refresh is delayed by 1 clock-cycle? How to refresh automatically?





