Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01tx31qm55v
Title: Privacy Implications of Not-So-Hidden Comments in arXiv Files and Analysis of Online Privacy Policies
Authors: Li, Frank
Advisors: Narayanan, Arvind
Department: Electrical Engineering
Certificate Program: Applications of Computing Program
Class Year: 2019
Abstract: Internet users can accidentally expose their own private information in a myriad of ways. This paper describes our approach to a large-scale measurement study on one case of online privacy leakage wherein users upload files for publication and sharing, files that can contain users’ private information hidden within them. We analyze comments in TeX source files of arXiv publications using various natural language processing techniques to identify specific attributes of comments that may represent privacy violations. We also perform near-duplicate detection and clustering on a large data set of privacy policy texts to understand how online privacy policy is communicated to users. We find that arXiv publications contain many interesting comments despite the ease with which authors can strip out all comments. We find that many privacy policy texts are duplicates or near-duplicates of one another.
URI: http://arks.princeton.edu/ark:/88435/dsp01tx31qm55v
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Electrical and Computer Engineering, 1932-2023

Files in This Item:
File Description SizeFormat 
LI-FRANK-THESIS.pdf717.91 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.