Solved

The Classic 5-Stage Pipeline Seen in Section 4

question 19

Essay

The classic 5-stage pipeline seen in Section 4.5 is IF, ID, EX, MEM, WB. This pipeline is designed specifically to execute the MIPS instruction set. MIPS is a load store architecture that performs one memory operation per instruction, hence a single MEM stage in the pipeline suffices. Also, its most common addressing mode is register displacement addressing. The EX stage is placed before the MEM stage to allow it to be used for address calculation. In this question we will consider a variation in the MIPS instruction set and the interactions of this variation with the pipeline structure.
The particular variation we are considering involves swapping the MEM and EX stages, creating a pipeline that looks like this: IF, ID, MEM, EX, WB. This change has two effects on the instruction set. First, it prevents us from using register displacement addressing (there is no longer an EX in front of MEM to accomplish this). However, in return we can use instructions with one memory input operand, i.e., register-memory instructions. For instance: multf_m f0,f2,(r2) multiplies the contents of register f2 and the value at memory location pointed to by r2, putting the result in f0.
(a) Dropping the register displacement addressing mode is potentially a big loss, since it is the mode most frequently used in MIPS. Why is it so frequent? Give two popular software constructs whose implementation uses register displacement addressing (i.e., uses displacement addressing with non- zero displacements).
(b) What is the difference between a dependence and a hazard?
(c) In this question we will work with the SAXPY loop.
do I = 0,N
Z[I] = A*X[I] + Y[I]
Here is the new assembly code.
0: slli r2,r1,#3 // I is in r1 1: addi r3,r2,#X
2: multf_m f2,f0,(r3) // A is in f0 3: addi r4,r2,#Y
4: addf_m f4,f2,(r4)
5: addi r4,r2,#Z
6: sf f4,(r4)
7: addi r1,r1,#1
8: slei r6,r1,r5 // N is in r5 9: bnez r6,#0
Using the instruction numbers, label the data and control dependences.
(d) Fill in the pipeline diagram for code for the new SAXPY loop. Label the stalls as d* for data-hazard stalls and s* for structural stalls. What is the latency of a single iteration? (The number of cycles between the completion of two successive #0 instructions). For this question, assume that FP addition takes 2 cycles, FP multiplication takes 3 cycles and that all other operations take a single cycle. The functional units are not pipelined. The FP adder, FP multiplier and integer ALU are all separate functional units, such that there are no structural hazards between them. The register file is written by the WB stage in the first half of a clock cycle and is read by the ID stage in the second half of a clock cycle. In addition, the processor has full forwarding. The processor stalls on branches until the outcome is available which is at the end of the EX stage. The processor has no provisions for maintaining "precise state". (e) In the pipeline for MIPS mentioned in the text, what is the reason for forcing non-memory operations to go through the MEM stage rather than proceeding directly to the WB stage?
(f) Aside from the direct loss of register displacement addressing and the subsequent instructions required to explicitly compute addresses, what are two other disadvantages of this sort of pipeline?
(g) Reduce the stalls by pipeline scheduling a single loop iteration. Show the resulting code and fill in the pipeline diagram. You do not need to show the optimal schedule for a correct response.

Definitions:

Common Fetish

A commonly shared or widely recognized sexual interest in a non-genital body part, object, or specific situation that leads to sexual arousal.

Shoes

Footwear designed to protect and comfort the human foot while doing various activities.

Hats

Head coverings worn for various purposes, including protection from the elements, fashion, or to signify social status.

Transgender

Individuals who have a strong sense that their gender identity is different from their birth anatomy.