floating-2. Produce a vectorized version of the following programs DO20I=1,N
B(I,1)-0
AL(I)=A(I)+B(I,J)X(I,J) 20 CONTINUE
20 CONTINUE
2.0 OBJECTIVES
At the end of this unit you should be able to
- Understand the operation of RISC machines
- Understand the number of parameters and variables a procedure deals with and the depth of nesting.
3.1 OPERATIONS
There is quite good agreement in the results of this mixture of languages and applications. Assignment statements predominate. suggesting that the simple movement -of data is -of high importance. There is also a preponderance -of conditional state-ments (IF. LOOP). These statestate-ments are implemented in machine language wit=
some sort of compare and branch instruction. This suggests that the sequence control mechanism of the instruction set is important.
These results are instructive to the machine instruction set designer, indicator, which types of statements occur most often and therefore should be supported in optimal"
fashion. However these results do not reveal which statements use the most time in the execution of a typical program. That is, given a compiled machine language program, which statements in the source language cause the execution o= the most machine-language instructions?
To get at this underlying phenomenon, the Patterson programs [PATT82aj. described in Appendix 4A. were compiled on the VAX, PDP-11, and MotorolL 68000 to determine the average number of machine instructions and memory references per statement type. The second and third columns in Table 13.2 show the relative frequency of occurrence of various HLL instructions in a variety of programs; the data were obtained by observing the occurrences in running programs rather than just the number of times that statements occur in the source code.
3.2 OPERANDS
Much less work has been done on the occurrence of types of operands, despite the
scalar variables. Further, more than 80 % of the scalars were local (to the procedure) variables. In addition, references to arrays/structures require a previous reference to their index or pointer, which again is usually a local scalar. Thus, there is a preponderance of references to scalars, and these are highly localized.
The Patterson study examined the dynamic behavior of HLL programs, inde-pendent of the underlying architecture. As discussed before, it is necessary to deal with actual architectures to examine program behavior more deeply. One study, [LUND77], examined DEC-10 instructions dynamically and found that each instruction on the average references 0.5 operand in memory and 1.4 registers. Sim-ilar results are reported in [HUCK83] for C, Pascal, and FORTRAN programs on S/370, PDP-11, and VAX. Of course, these figures depend highly on both the architecture and the compiler, but they do illustrate the frequency of operand accessing.
These latter studies suggest the importance of an architecture that lends itself to fast operand accessing, because this operation is performed so frequently. The Patterson study suggests that a prime candidate for optimization is the mechanism for storing and accessing local scalar variables.
We have seen that procedure calls and returns are an important aspect of HLL programs.
The evidence (Table 13.2) suggests that these are the most time-consuming operations in compiled HLL programs. Thus, it will be profitable to consider ways of implementing these operations efficiently. Two aspects are significant: the number of parameters and variables that a procedure deals with, and the depth of nesting.
Tanenbaum's study [TANE78] found that 98% of dynamically called procedures were passed fewer than six arguments and that 92% of them used fewer than six local scalar variables. Similar results were reported by the Berkeley RISC team [KATE83], as shown in Table 13.4. These results show that the number of words required per procedure activation is not large., The studies reported earlier indicated that a high proportion of operand references is to local scalar variables. These studies show that those references are in fact confined to relatively few variables.
3.3 IMPLICATIONS
Generalizing from the work of a number of researchers, three elements emerge that, by and large, characterize RISC architectures. First, use a large number of registers or use a compiler to optimize register usage. This is intended to optimize operand referencing.
The studies just discussed show that there are several references per HLL instruction and that there is a high proportion of move (assignment) statements. This, coupled with the locality and predominance of scalar references, suggests that performance can be improved by reducing memory references at the expense of more register references.
Because of the locality of these references, an expanded register set seems practical.
Second, careful attention needs to be paid to the design of instruction pipelines. Because of the high proportion of conditional branch and procedure call instructions, a straightforward instruction pipeline will be inefficient. This manifests itself as a high proportion of instructions that are prefetched but never executed.
Finally, a simplified (reduced) instruction set is indicated. This point is not as obvious as the others, but should become clearer in the ensuing discussion.
4.0 CONCLUSION
Assignment statements predominate, suggesting that the simple movement of data should be optimized. There are also many IF and LOOP instructions, which suggest that the underlying sequence control mechanism needs to be optimized to permit efficient pipelining. Studies of operand reference patterns suggest that it should be possible to enhance, performance by keeping A moderate number of operands in registers.
5.0 SUMMARY
The simple instruction set of a RISC lends itself to efficient pipelining because there are fewer and more predictable operations performed per instruction. Other instruction to improve pipeline efficiency.
6.0 Tutor marked assignment 1. What is a delayed branch?
UNIT 2: REDUCED INSTRUCTION SET ARCHITECTURE