Skip navigation
Please use this identifier to cite or link to this item:
Title: Scaling Machine Learning in Practice
Authors: Suo, Daniel Can
Advisors: LiHazan, KaiElad
Contributors: Computer Science Department
Keywords: Machine Learning
Subjects: Computer science
Issue Date: 2023
Publisher: Princeton, NJ : Princeton University
Abstract: In recent years, machine learning has become pervasive, powering algorithmic clin-icians, translators, and world-beating go masters. As practitioners build on this success, they repeatedly observe that scale–data, model size, compute–is critical. However, scale is now a challenge in and of itself; simple tasks such as gathering data become formidable, even prohibitive. In this thesis, we discuss techniques for addressing scale in three areas: 1. Differentiable reinforcement learning for physical devices: Reinforce- ment learning has emerged as a potential strategy for machines to make deci- sions in complex, dynamic environments. However, successful demonstrations have required vast experience to learn an optimal policy, making real-world physical applications particularly challenging. We present a method that uses limited experience to learn a differentiable simulator of a physical system and then uses gradient methods on the simulator to learn a state-of-the-art policy for controlling that system. 2. Practical optimization for deep learning: Optimization is an essential as- pect of deep learning. However, while a constellation of optimization algorithms dot the literature, the low burden of proof and empirical nature of deep learning has led practitioners to rely on defaults (i.e., Adagrad, Adam) rather than view optimization as a lever for progress. To rigorously test ideas in optimization, we introduce a comprehensive benchmark that currently includes 8 deep learning workloads and rules for training procedures, computational budget, and eval- uation. We also use the benchmark to evaluate new optimization results and re-evaluate existing ones. 3. Scaling computer systems via thread scheduling: Large global-scale ap- plications are expensive and complex to operate let alone optimize. As a result, many simple parameters that govern important behaviors of these systems are simply set once and never touched again. However, we show that these param- eters present low-hanging fruit for significant efficiency improvements.
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Suo_princeton_0181D_14522.pdf13.13 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.