Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp016t053j389
Title: FinFET-based System Modeling and Low-Power System Design
Authors: Chen, Xianmin
Advisors: Jha, Niraj K.
Contributors: Electrical Engineering Department
Keywords: 3D IC
FinFET
Low-power
NoC
PVT variation
System modeling
Subjects: Computer engineering
Electrical engineering
Issue Date: 2016
Publisher: Princeton, NJ : Princeton University
Abstract: FinFET has begun to replace MOSFET at the 22nm technology node and beyond. Compared to planar CMOS, FinFET has higher on-current and lower leakage due to its double-gate structure. For system architects who wish to explore the use of this new technology, a FinFET-based simulation framework can be very helpful at the early design stages. However, such a simulator did not exist. Our work seeks to fill this gap. We present the details of one such simulation framework, called gem5-PVT, that we have developed. gem5-PVT leverages existing lower-level FinFET simulators to support timing, power, and thermal studies of FinFET-based chip multiprocessor (CMP) systems under process, voltage, and temperature (PVT) variations. It uses a bottom-up modeling approach based on logic/memory cell libraries that have been very accurately characterized using TCAD device simulation. This allows accuracy to bubble up to the system level. The second problem target in this work is to reduce CMP power consumption. The power budget is expected to limit the portion of the chip we can power-on at the upcoming technology nodes. This problem, known as the utilization wall or dark silicon, is becoming increasingly serious. With the introduction of 3-dimensional (3D) integrated circuits (ICs), it is likely to become even more severe. Thus, how to take advantage of the extra transistors, made available by Moore's Law and the onset of 3D ICs, within the power budget poses a significant challenge to system designers. We propose several approaches and architectures to reduce CMP power targeting three aspects: leakage power, the power consumed by computation, and the power consumed by communication. To reduce leakage power, we use a hybrid FinFET style to design ultra-low-leakage FinFET CPU cores. This approach exploits the ultra-low-leakage feature of asymmetric-workfunction shorted-gate (ASG) FinFETs and the high-performance feature of shorted-gate (SG) FinFETs. We explore the impact of the hybrid style at both the module and CPU levels. Our study shows that using the hybrid FinFET style can reduce leakage power of CPU down to 29.6% of an SG baseline CPU. To reduce the power consumed by computation, we propose a 3D hybrid architecture consisting of a CPU layer with multiple cores, a field-programmable gate array (FPGA) layer, and a dynamic random-access memory (DRAM) layer. The architecture is designed for low power without sacrificing performance. The FPGA layer is capable of supporting a variety of accelerators. It is placed adjacent to the CPU layer, with a communication mechanism that allows accelerators on it to access CPU data caches directly. This enables fast switches between these two layers. Because FPGA accelerators consume much less power than out-of-order CPU cores, this architecture reduces power significantly compared with a baseline with only a CPU layer and a DRAM layer, at better or same performance. To reduce the power consumed by communication, we focus on the state-of-the-art Single-cycle Multi-hop Asynchronous Repeated Traversal (SMART) network-on-chip. SMART achieves ultra-low latency by enabling a flit to bypass the pipelines of intermediate routers entirely. This enables a flit to traverse multiple routers within a single clock cycle. However, there are two concerns related to SMART: i) it employs dedicated broadcast wires to transmit SMART-hop setup requests (SSRs), incurring large wire and energy overheads, and ii) it needs a complex allocator to arbitrate multiple simultaneous SSRs. In this work, we propose an SSR network to address these two concerns. This is a specialized network that replaces long and overlapping broadcast wires with shorter wires and switches. As a result, it reduces the wire overhead by up to 12.2x and dynamic energy up to 15.7%. It also eliminates low-priority SSRs before they reach the allocator and, therefore, leads to a simplified allocator design and less energy consumption.
URI: http://arks.princeton.edu/ark:/88435/dsp016t053j389
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: http://catalog.princeton.edu/
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Electrical Engineering

Files in This Item:
File Description SizeFormat 
Chen_princeton_0181D_11675.pdf3.9 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.