• No results found

OVERVIEW OF RELATED WORK ON QUEUE CORE PROCESSOR

In document QUEUE COMPUTING (Page 31-38)

The present day Queue Core Processor has achieved the goal of reducing the performance crises in the microprocessor industry by virtue of the Queue Computing infrastructure it exhibits.

QC-2 has received tremendous development over the past years including the development of the Queue Compiler, Queue Assembler, efficient algorithms targeted for improvement.

2.1.1 INSTRUCTION SET ARCHITECTURE OF THE QUEUE CORE PROCESSOR The QC-2 Processor implements a Produced Order Instruction Set Architecture. Here, each instruction can encode at most 2 operands that specify the location in the Register File where to read the operands. The Processor determines the physical location of operands by adding the offset reference in the instruction to the current QH pointer.

In [16], the Queue Core Instruction Set Architecture was presented. All instructions of QC-2 Processor are 16-bit wide, allowing simple fetch and decode stages and facilitating pipelining of the Processor. In the case of insufficient bit to express large constants, memory offsets or offset references, the Queue Core Processor implements the QCaEXT which inserts a “covop”

instruction. This special instruction is used to extend load and store instructions offsets and also extends immediate values when necessary. This consequently curbs the bid- width constraints in the Queue Core Processor.

QC-2 Instruction Set Architecture uses mnemonics to denote operations. Each operation has a unique operation code (opcode) which specifies the exact operation needed to be performed by that instruction (the opcode add, sub indicates addition and subtraction operations respectively) The first six (6) or eight (8) bits of any QC-2 instruction specifies the opcode whiles the rest of the bits denote instructions to be executed.

Below are the instruction formats for “COVOP” instruction and “LOAD / STORE” instructions.

COVOP INSTRUCTION 8 bits 8 bits

This instruction is used to convey an 8-bit address to a load/store instruction to extend the addressing bits from 8 to 16 bit

LOAD/ STORE INSTRUCTION

The above instruction load K (byte, string or word-signed/unsigned) to the operand queue pointed by the QT from memory address ((d)+addr0)

Details of the QC-2 instruction set architecture are made available in Appendix I.

6 -bits 2 -bits 8-bits

covop: 00000100 Addr 1

Mnemonic Action QH QT Binary

Covop addr1 Convey address 0 0 00000100

15 8 7 0

FIG 7: COVOP INSTRUCTION FORMAT

Load :opcode d addr 0

Mnemonic Action QH QT Binary

Ldk addr0(d) QTw m((d)+addr1.addr0) 0 1 opcode 15 10 9 8 7 0

LOAD INSTRUCTION: Ld k: k=byte, string, word (k could be unsigned or signed)

FIG 8: LOAD/ STORE INSTRUCTION FORMAT

2.1.2 QC-2 SPECIAL PURPOSE REGISTERS

Queue Core defines a set of special purpose registers available to the programmer to be used as the frame pointer register ($fp), stack pointer register ($sp), and return address

register ($ra).

Frame pointer register serves as base register to access local variables, incoming parameters, and saved registers. Stack pointer register is used as the base address for outgoing parameters to other functions.

2.1.3 SYNTHESIS OF THE QUEUE CORE PROCESSOR (QC-2)

To make the QC-2 design easy to debug, modify, and adapt, a high-level description was used in the design process, which was also used by other system designers.

The QC-2 core was developed in Verilog HDL. After synthesizing the HDL code, the designed Processor has characteristics that enable investigation of the actual Hardware performance and functional correctness. It also gives the possibility to study the effect of coding style and Instruction Set Architectures over various optimizations. For the QC-2 Processor to be useful for these purposes, the following requirements were identified:

High-level description: the format of the QC-2 description should be easy to understand and modify;

Modular: to add or remove new instructions, only the relevant parts should have to be modified. A monolithic design would make experiments difficult;

The Processor description should be synthesizable to derive actual implementations. The QC-2 has been designed with a distributed controller to facilitate debugging and future adaptation for specific application requirements since it was targeted for embedded applications.

2.1.4 QUEUE CORE COMPILER DESIGN

A lot of efforts were made towards the development of an efficient compiler for the QC-2 after the implementation of the Queue Core Processor was studied thoroughly. With regards to this, several compiling techniques were investigated for the Queue machine by Abderazek, Canedo and Sowa. The Queue Compiler was however not developed at that time. All previous attempts to develop the Queue Compiler were focused on using a retargetable code generator for register machines to map and rearrange low level intermediate register code into Queue code. This approach led to complex algorithms and unacceptable quality of the generated queue code.

A GCC based Queue compiler was then developed to evaluate the Queue programs in [4]. The Queue Compiler generates assembly code from C programs. This compiler is, for the best of knowledge, the first automated code generation tool designed for the Queue Computation Model.

It has the higher capability of exposing natural Instruction Level Parallelism (ILP) from the input programs to the Queue Core Processor.

The GCC‟s 4.0.2 C front-end is used to build an Abstract Syntax Tree (AST) from the C program. The AST is transformed to GIMPLE representation (allows operations to have at most 2 operands) by the GCC‟s middle-end. The GIMPLE tree is reconstructed to form QTrees. The back-end which is the developed Queue Compiler takes the QTrees as input to produce Level Directed Acyclic Graphs (LDAGs).

The 2-offset P-code generation algorithm [9] in the code generator component of the Queue Compiler takes LDAGs and produce a low level linear intermediate representation called QIR.

The assembly code is then generated from the QIR instructions.

The Queue Core Compiler was developed for the Queue Computation Model with a suitable data structure. It produces a good code with no artifacts. The infrastructure it provides is clean and easy to maintain. Its ability to compile large amount of programs garnishes its interesting features.

The uniqueness of the developed compiler is from the 1-offset code generation algorithm [10]

implemented as the first and second phases in the back-end. This algorithm transforms the data flow graph to assure that the program can be executed using a one-offset queue instruction set.

The algorithm then statically determines the offset values for all instructions by measuring the

distance of QH relative position with respect of each instruction. Each offset value is computed once and remains the same until the final assembly code is generated

Among other crucial roles played by the developed Queue Compiler encompasses constraining all instructions to have at most one offset reference, computation of offset reference values, Queue register allocation and Scheduling of the Queue program expressions in Level-Order manner.

The figure below shows the structure of the Queue Compiler

FIG 9: QUEUE COMPILER INFRASTRUCTURE

2.1.5 QUEUE CORE ASSEMBLER

The utmost need to map the assembly program produced by the Queue Compiler to the equivalent machine code for the Queue Processor to execute motivated the development of Queue Core Assembler (Qasm) [13]. Qasm supports two computing models: Queue Core (QC-2) model and Dual Execution Processor (DEP) model;

It supports the output of the Queue Compiler with the help of a preprocessor.

The figure below represents the structure of the Qasm.

Converting compiler output into intermediate file

Converting intermediate file into machine code

Parsing the intermediate files

FIG 10: QUEUE ASSEMBLER INFRASTRUCTURE

2.1.6 CURRENT STATE OF THE ART OF THE QUEUE CORE PROCESSOR

The Queue Core Processor has gone through series of dramatic transformations from the time the idea of Queue Processor was conceived and implemented up to date. The transformation in the Queue Processor traverses improvements in its system architecture, development of the Queue Core Compiler, and the Queue Core Assembler.

The Queue Compiler has been successfully developed for the Queue Core Processor. It accepts any program written in C and generates a corresponding Assembly language program for the QC-2 Assembler.

The Queue Core Assembler was developed in order to translate the Assembly language of the Queue Processor into the equivalent machine language.

Simulator? Several simulators exist for different microprocessors. These simulators are built to emulate the microprocessors in diverse respects. Simulators are generally architecture specific in the sense that a simulator design for a particular architecture may not simulate a microprocessor with different architecture.

Several simulators have been built over the years for different microprocessors such as the RISC and CISC architectures with special reference to SPIM for the RISC architecture.

Unfortunately, the existing simulators cannot simulate the Queue Core Processor. This is primarily due to the differences in their architectures and Instruction sets.

CHAPTER THREE (3) 3.0 SIMULATOR DESIGN INFRASTRUCTURE

This research falls in the cross-road of two major disciplines; System Architecture and Software Engineering. By virtue of this, the research technique adopted will cut across both domains.

The simulator design infrastructure will primarily base on the Queue Core Processor architecture. The system architecture of the Queue Core Processor was studied in details to ascertain the functional units and their design infrastructure.

It is worth mentioning that other existing simulators [46,47] were extensively studied to gain in-depth knowledge, features and functionalities which were incorporated in our simulator (QSIM).

3.1. INFRASTRUCTURE OF THE INSTRUCTION PIPELINE STAGES

In document QUEUE COMPUTING (Page 31-38)