Examlex

Solved

The Classic 5-Stage Pipeline Seen in Section 4

question 19

Essay

The classic 5-stage pipeline seen in Section 4.5 is IF, ID, EX, MEM, WB. This pipeline is designed specifically to execute the MIPS instruction set. MIPS is a load store architecture that performs one memory operation per instruction, hence a single MEM stage in the pipeline suffices. Also, its most common addressing mode is register displacement addressing. The EX stage is placed before the MEM stage to allow it to be used for address calculation. In this question we will consider a variation in the MIPS instruction set and the interactions of this variation with the pipeline structure.
The particular variation we are considering involves swapping the MEM and EX stages, creating a pipeline that looks like this: IF, ID, MEM, EX, WB. This change has two effects on the instruction set. First, it prevents us from using register displacement addressing (there is no longer an EX in front of MEM to accomplish this). However, in return we can use instructions with one memory input operand, i.e., register-memory instructions. For instance: multf_m f0,f2,(r2) multiplies the contents of register f2 and the value at memory location pointed to by r2, putting the result in f0.
(a) Dropping the register displacement addressing mode is potentially a big loss, since it is the mode most frequently used in MIPS. Why is it so frequent? Give two popular software constructs whose implementation uses register displacement addressing (i.e., uses displacement addressing with non- zero displacements).
(b) What is the difference between a dependence and a hazard?
(c) In this question we will work with the SAXPY loop.
do I = 0,N
Z[I] = A*X[I] + Y[I]
Here is the new assembly code.
0: slli r2,r1,#3 // I is in r1 1: addi r3,r2,#X
2: multf_m f2,f0,(r3) // A is in f0 3: addi r4,r2,#Y
4: addf_m f4,f2,(r4)
5: addi r4,r2,#Z
6: sf f4,(r4)
7: addi r1,r1,#1
8: slei r6,r1,r5 // N is in r5 9: bnez r6,#0
Using the instruction numbers, label the data and control dependences.
(d) Fill in the pipeline diagram for code for the new SAXPY loop. Label the stalls as d* for data-hazard stalls and s* for structural stalls. What is the latency of a single iteration? (The number of cycles between the completion of two successive #0 instructions). For this question, assume that FP addition takes 2 cycles, FP multiplication takes 3 cycles and that all other operations take a single cycle. The functional units are not pipelined. The FP adder, FP multiplier and integer ALU are all separate functional units, such that there are no structural hazards between them. The register file is written by the WB stage in the first half of a clock cycle and is read by the ID stage in the second half of a clock cycle. In addition, the processor has full forwarding. The processor stalls on branches until the outcome is available which is at the end of the EX stage. The processor has no provisions for maintaining "precise state". The classic 5-stage pipeline seen in Section 4.5 is IF, ID, EX, MEM, WB. This pipeline is designed specifically to execute the MIPS instruction set. MIPS is a load store architecture that performs one memory operation per instruction, hence a single MEM stage in the pipeline suffices. Also, its most common addressing mode is register displacement addressing. The EX stage is placed before the MEM stage to allow it to be used for address calculation. In this question we will consider a variation in the MIPS instruction set and the interactions of this variation with the pipeline structure. The particular variation we are considering involves swapping the MEM and EX stages, creating a pipeline that looks like this: IF, ID, MEM, EX, WB. This change has two effects on the instruction set. First, it prevents us from using register displacement addressing (there is no longer an EX in front of MEM to accomplish this). However, in return we can use instructions with one memory input operand, i.e., register-memory instructions. For instance: multf_m f0,f2,(r2) multiplies the contents of register f2 and the value at memory location pointed to by r2, putting the result in f0. (a) Dropping the register displacement addressing mode is potentially a big loss, since it is the mode most frequently used in MIPS. Why is it so frequent? Give two popular software constructs whose implementation uses register displacement addressing (i.e., uses displacement addressing with non- zero displacements). (b) What is the difference between a dependence and a hazard? (c) In this question we will work with the SAXPY loop. do I = 0,N Z[I] = A*X[I] + Y[I] Here is the new assembly code. 0: slli r2,r1,#3 // I is in r1 1: addi r3,r2,#X 2: multf_m f2,f0,(r3) // A is in f0 3: addi r4,r2,#Y 4: addf_m f4,f2,(r4) 5: addi r4,r2,#Z 6: sf f4,(r4) 7: addi r1,r1,#1 8: slei r6,r1,r5 // N is in r5 9: bnez r6,#0 Using the instruction numbers, label the data and control dependences. (d) Fill in the pipeline diagram for code for the new SAXPY loop. Label the stalls as d* for data-hazard stalls and s* for structural stalls. What is the latency of a single iteration? (The number of cycles between the completion of two successive #0 instructions). For this question, assume that FP addition takes 2 cycles, FP multiplication takes 3 cycles and that all other operations take a single cycle. The functional units are not pipelined. The FP adder, FP multiplier and integer ALU are all separate functional units, such that there are no structural hazards between them. The register file is written by the WB stage in the first half of a clock cycle and is read by the ID stage in the second half of a clock cycle. In addition, the processor has full forwarding. The processor stalls on branches until the outcome is available which is at the end of the EX stage. The processor has no provisions for maintaining  precise state .   (e) In the pipeline for MIPS mentioned in the text, what is the reason for forcing non-memory operations to go through the MEM stage rather than proceeding directly to the WB stage? (f) Aside from the direct loss of register displacement addressing and the subsequent instructions required to explicitly compute addresses, what are two other disadvantages of this sort of pipeline? (g) Reduce the stalls by pipeline scheduling a single loop iteration. Show the resulting code and fill in the pipeline diagram. You do not need to show the optimal schedule for a correct response. (e) In the pipeline for MIPS mentioned in the text, what is the reason for forcing non-memory operations to go through the MEM stage rather than proceeding directly to the WB stage?
(f) Aside from the direct loss of register displacement addressing and the subsequent instructions required to explicitly compute addresses, what are two other disadvantages of this sort of pipeline?
(g) Reduce the stalls by pipeline scheduling a single loop iteration. Show the resulting code and fill in the pipeline diagram. You do not need to show the optimal schedule for a correct response.

Identify the appropriate research method based on the research question or scenario.
Explain the concepts of correlation and causation.
Understand how to interpret correlation coefficients and their implications for research findings.
Discriminate between different data collection techniques across research methods.

Definitions:

Forced-Choice Method

A trait approach to performance rating that requires the rater to choose from statements designed to distinguish between successful and unsuccessful performance.

Developmental Purposes

Activities or strategies designed to improve individual skills, capabilities, and knowledge, often for professional growth or to enhance performance in current or future roles.

Administrative Purposes

Uses related to the management, organization, and conduct of a business or institution's daily operations.

Peer Evaluations

A process in which individuals assess the work performance and contributions of their colleagues, often used in academic and professional settings to enhance productivity and identify areas for improvement.

Related Questions