Conference Paper

A 14-port 3.8 ns 116-word 64b read-renaming register file

March 1995
Digest of Technical Papers - IEEE International Solid-State Circuits Conference

March 1995

DOI:10.1109/ISSCC.1995.535449

Source
IEEE Xplore

Conference: Solid-State Circuits Conference, 1995. Digest of Technical Papers. 42nd ISSCC, 1995 IEEE International

Authors:

Robert K. Montoye

Show all 6 authorsHide

A 116-word×64b register file with ten read ports and four write ports is part of a four-issue superscalar, register-renamed, four-window, V9 SPARC-architecture CPU operating at 154MHz. Since the register file combines a register-rename function with the register-read operation, the CPU pipeline is one stage shorter than other register-renaming architectures

Out-of-Order Commit Processors.

Conference Paper

Full-text available

Jan 2004

Modern out-of-order processors tolerate long latency memory operations by supporting a large number of in-flight instructions. This is particularly useful in numerical applications where branch speculation is normally not a problem and where the cache hierarchy is not capable of delivering the data soon enough. In order to support more in-flight instructions, several resources have to be up-sized, such as the reorder buffer (ROB), the general purpose instructions queues, the load/store queue and the number of physical registers in the processor. However, scaling-up the number of entries in these resources is impractical because of area, cycle time, and power consumption constraints. We propose to increase the capacity of future processors by augmenting the number of in-flight instructions. Instead of simply up-sizing resources, we push for new and novel microarchitectural structures that achieve the same performance benefits but with a much lower need for resources. Our main contribution is a new checkpointing mechanism that is capable of keeping thousands of in-flight instructions at a practically constant cost. We also propose a queuing mechanism that takes advantage of the differences in waiting time of the instructions in the flow. Using these two mechanisms our processor has a performance degradation of only 10% for SPEC2000fp over a conventional processor requiring more than an order of magnitude additional entries in the ROB and instruction queues, and about a 200% improvement over a current processor with a similar number of entries.

Complexity-Effective Superscalar Processors

Conference Paper

Full-text available

Jul 1997

Not Available

Design space of register renaming techniques

Article

Full-text available

Oct 2000

Dezso Sima

Register renaming is a technique to remove false data dependencie-write after read (WAR) and write after write (WAW)-that occur in straight line code between register operands of subsequent instructions. By eliminating related precedence requirements in the execution sequence of the instructions, renaming increases the average number of instructions that are available for parallel execution per cycle. This results in increased IPC (number of instructions executed per cycle). The identification and exploration of the design space of register-renaming lead to a comprehensive understanding of this intricate technique. As this article shows, the design space of register renaming is spanned by four main dimensions: the scope of register renaming, the layout of the rename buffers, the method of register mapping, and the rename rate. Relevant aspects of the design space give rise to eight basic alternatives for register-renaming. In addition, the kind of operand fetch policy significantly affects how the processor carries out the rename process, which duplicates the eight basic alternatives to 16 possible implementation schemes. The article indicates which basic implementation scheme is used in relevant superscalar processors. As register renaming is usually implemented in conjunction with shelving, the underlying microarchitecture is assumed to employ shelving

A 32×32-b adiabatic register file with supply clock generator

Article

Jun 1998

A 32×32-b adiabatic register file with one read port and one write port is designed. A four-phase clock generator is also designed to provide supply clocks for adiabatic circuits. All the word line and bit line charge on the capacitive interconnections is recovered to save energy. Adiabatic circuits are based on efficient charge recovery logic (ECRL) and are integrated using 0.8 μm complimentary metal-oxide-semiconductor (CMOS) technology. Measurement results show that power consumption of the core is significantly reduced by a factor of up to 3.5 compared with a conventional circuit

Quantifying the Complexity of Superscalar Processors

Article

Jan 2002

To characterize future performance limitations of superscalar processors, the delays of key pipeline structures in superscalar processors are studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of and .

Implementation trade-offs in using a restricted data flow architecture in a high performance RISC microprocessor

Conference Paper

Full-text available

Jul 1995

The implementation of a superscalar, speculative execution SPARC-V9 microprocessor incorporating restricted data flow principles required many design trade-offs. Consideration was given to both performance and cost. Performance is largely a function of cycle time and instructions executed per cycle while cost is primarily a function of die area. Here we describe our restricted data flow implementation and the means with which we arrived at its configuration. Future semi-conductor technology advances will allow these trade-offs to be relaxed and higher performance restricted data flow machines to be built.

Microarchitecture of HaL's CPU

Conference Paper

Full-text available

Apr 1995

The HaL PM1 CPU is the first implementation of the 64-bit SPARC Version 9 instruction set architecture. The processor utilizes superscalar instruction issue, register renaming, and a dataflow model of execution. Instructions can complete out-of-order and are later committed in order. The PM1 CPU maintains precise state. The processor has a higher level of reliability than is currently available in desktop computers for the commercial marketplace

SPARC64: a 64-b 64-active-instruction out-of-order-execution MCM processor

Article

Dec 1995

We report the first implementation of the new SPARC V9 64-b instruction set architecture. The HaL processor called SPARC64 is a ceramic Multi-Chip Module (MCM) that contains one CPU chip, one Memory Management Unit (MMU) chip, and four 64 KB Cache chips. Together, they implement a unique three-level address translation scheme that efficiently supports using virtual addresses spread anywhere in the full 64-b address range. The processor assigns a serial number to each issued instruction to track up to 64 in-progress instructions and can speculatively issue through up to 16 branches. It issues up to 4 instructions per cycle and utilizes superscalar instruction issue, register renaming, and dataflow (potentially out-of-order) execution to fully exploit instruction-level parallelism. The processor maintains a precise-state execution model, and commits in-order, up to 9 instructions in a cycle. In a HaL R1 system, a production SPARC64 running at 143 MHz has a performance of 230 SPECint92 and 300 SPECfp92 and dissipates 50 W from a 3.3 V supply

Complexity-Effective Superscalar Processors

Article

Sep 2000
Comput Architect News

The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of , 894 , and . Performance results and trends are expressed in terms of issue width and window size. Our analysis indicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future.

A 64b 4-issue out-of-order execution RISC processor

Conference Paper

Mar 1995

This processor is the first implementation of the SPARC V9 64b instruction set architecture and has an estimated performance exceeding 256 SPECint92 and 330 SPECfp92 at 154 MHz. The R1 processor consists of one CPU chip, one memory management unit (MMU), four cache chips, and one clock chip mounted on a ceramic multi-chip module (MCM). The processor utilizes superscalar instruction issue, register renaming, and data flow execution to exploit instruction-level parallelism

PowerPC 604 RISC Microprocessor

M Deiiman

A 14-port 3.8 ns 116-word 64b read-renaming register file

Abstract

No full-text available

Recommended publications

Logical Effort analysis of multi-port register file architectures