Please use this identifier to cite or link to this item:
|Title:||Video Synthesis: Binary Masks to Frames via DeepInversion|
|Abstract:||Machine learning models rely on data for training, so they can help make real-world predictions. The acquisition of such training data can be arduous in certain situations. For example, some models are developed and trained using data that is privacy protected. As a result, such datasets become inaccessible to researchers seeking to train new models and make predictions. If we can recover the training data from a pre-trained model, this can greatly aid in potential knowledge transfer. In this thesis, we use a recently developed technique called DeepInversion to synthesize video training data. DeepInversion is applied to invert a Mask R-CNN architecture, in order to produce synthetic frames of videos in the DAVIS dataset. We perform input optimization from random noise to high fidelity frames. Specifically, we optimize a classification loss, defined between ground truth and predicted coarse masks, as well as auxiliary losses that minimize noise and batch normalization statistic differences. We train for 2k iterations with a learning rate of 0.1 and an Adam optimizer. The viability of our method is tested on many first frames of videos in the DAVIS set, with different auxiliary loss parameter scaling values for each frame. Finally, we synthesize many frames of the ’bear’ video and string them together to produce a synthetic video. Ideas developed in this thesis can be greatly beneficial in the domains of federated learning, privacy-protected data acquisition, and lower latency model training.|
|Type of Material:||Princeton University Senior Theses|
|Appears in Collections:||Electrical Engineering, 1932-2020|
Files in This Item:
|SANTHANAM-HARI-THESIS.pdf||4.63 MB||Adobe PDF||Request a copy|
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.