Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01fq977z03b
Title: Making Neural Network Models More Efficient
Authors: Su, Yushan
Advisors: Li, Kai
Contributors: Computer Science Department
Subjects: Computer science
Issue Date: 2023
Publisher: Princeton, NJ : Princeton University
Abstract: Complex machine learning tasks typically require large neural network models. However, training and inference on neural models require substantial compute power and large memory footprints, and incur significant costs. My thesis studies methods to make neural networks efficient at a relatively low cost.First, we explore how to utilize CPU servers for training and inference. CPU servers are more readily available, have larger memories, and cost much less than GPUs or hardware accelerators. However, they are much less efficient for training and inference tasks. My thesis studies how to design efficient software kernels for sparse neural networks that allow unstructured pruning to achieve efficiency of training or inference. Our evaluation shows that our sparse kernels can achieve 6.4x-20.0x speedups for medium sparsities over the commonly used Intel MKL sparse library for CPUs and greatly reduce the performance gap with those for GPUs. Second, we study how to achieve high-throughput inference for large models. We propose PruMUX, a method to combine data multiplexing with model compression. We find that in most cases, PruMUX can achieve better throughput than using each approach alone for a given accuracy loss budget. Third, we study how to find best sets of parameters for PruMUX in order to make it practical. We propose Auto-PruMUX, which uses performance modeling based on a set of data points to predict multiplexing parameters for DataMUX and sparsity parameters for a given model compression technique. Our evaluation shows that Auto-PruMUX can successfully find or predict parameters to achieve the best throughput given an accuracy loss budget. This dissertation also proposes several future research directions in the areas of our studies.
URI: http://arks.princeton.edu/ark:/88435/dsp01fq977z03b
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Su_princeton_0181D_14600.pdf1.77 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.