Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01m039k828f
Title: | Navigating Heterogeneity and Scalability in Modern Chip Design |
Authors: | Orenes-Vera, Marcelo |
Advisors: | Martonosi, Margaret Wentzlaff, David |
Contributors: | Computer Science Department |
Subjects: | Computer engineering Computer science |
Issue Date: | 2024 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | Computing systems have become ubiquitous in the modern world but their design is far from one-size-fits-all. From battery-powered devices to supercomputers, deployment requirements are a primary driver of heterogeneity in computer design. As modern systems rely on parallelism and specialization to achieve their performance and power goals, new challenges arise. The system's complexity grows with the number of distinct hardware modules, complicating the verification of correct and secure behavior. Moreover, expanding parallelization across more processing units (PUs) increases the pressure on the memory hierarchy and inter-PU network, which results in severe bottlenecks for applications traversing graph-like data structures with indirect memory accesses (IMAs). These challenges call for re-thinking software abstractions and hardware designs to achieve scalable and efficient systems, as well as introducing robust methodologies to ensure their correctness. My dissertation aims to tackle these challenges with three main thrusts. First, to facilitate hardware designers applying formal verification to their modules, this dissertation introduces AutoSVA, a toolflow that generates formal verification testbenches from module interface annotations. Testbenches generated with AutoSVA have uncovered bugs in open-source projects, including a widely used RISC-V CPU. Second, to alleviate IMA latency without increasing verification complexity, this dissertation introduces MAPLE, a network-connected memory-access engine that supports data pipelining and prefetching without requiring PU modifications. As such, off-the-shelf PUs can offload IMAs to MAPLE, and consume data via software-managed queues. Using MAPLE effectively mitigates memory latency, providing 2x speedups over software- and hardware-only prefetching. Third, to further the scalability of graph and sparse workloads, this dissertation co-designs scale-out architectures with a data-centric execution model, Dalorex, where IMAs are split into tasks that only access a confined address range and execute at the PU with dedicated access to that memory range. The parallelization of breadth-first-search on a billion-edge graph across a million PUs results in nearly an order of magnitude faster runtimes than Graph500's top entries. By introducing novel hardware designs, execution models, and verification tools, this dissertation contributes towards addressing the challenges posed by the increasing demand for high-performance, energy-efficient, and cost-effective computing systems. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01m039k828f |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Computer Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
OrenesVera_princeton_0181D_14994.pdf | 6.18 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.