Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp012r36v181k
Title: Efficient, template-based, self-supervised 2D human pose estimation
Authors: Yoo, Nobline
Advisors: Russakovsky, Olga
Department: Computer Science
Certificate Program: Robotics & Intelligent Systems Program
Class Year: 2023
Abstract: Two-dimensional human pose estimation is a challenging task, where the goal is to localize key anatomical landmarks (e.g. elbows, knees, shoulders), given an image of a person in some pose. Current state-of-the-art pose estimation techniques make use of thousands if not tens of thousands of labeled figures to finetune transformers or train deep convolutional neural networks. These kinds of methods are manual-labor-intensive, requiring tools like Amazon Mechanical Turk to crowdsource pose labels on individual frames. Self-supervised methods, on the other hand, re-frame the pose estimation task as a re- construction problem (i.e. given one part of the input data, reconstruct another part of the input data), effectively doing away with the need for ground truth labels. This enables them to leverage the vast amount of visual content that has yet to be labeled, though at the present cost of yielding lower accuracies than their supervised counterparts. In this paper, we explore how to improve unsupervised pose estimation systems. We (1) conduct deep dive analysis into the relationship between reconstruction loss and pose estimate accuracy, (2) propose an efficient model architecture that quickly learns how to localize joints, and (3) offer a new metric of consistency that can be used to measure how consistent a model’s pose estimates are with regard to body proportions. Importantly, we arrive at a model that outperforms the original model (Schmidtke et al. [1]) that inspired our work, and we find that a combination of carefully engineered reconstruction losses and inductive bias coding can help coordinate pose learning alongside reconstruction in a self- supervised paradigm.
URI: http://arks.princeton.edu/ark:/88435/dsp012r36v181k
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File Description SizeFormat 
YOO-NOBLINE-THESIS.pdf6.49 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.