Skip navigation
Please use this identifier to cite or link to this item:
Title: Consumer Protection on the Web with Longitudinal Web Crawls and Analysis
Authors: Amos, Ryan
Advisors: FeltenMittal, EdwardPrateek W
Contributors: Computer Science Department
Keywords: consumer protection
web measurement
Subjects: Computer science
Issue Date: 2022
Publisher: Princeton, NJ : Princeton University
Abstract: The world wide web has brought with it new consumer protection hazards, such as deceptive reviews and online tracking. While many academics have studied consumer protection on the web at specific points in time, we approach this problem from a longitudinal perspective, exploring how consumers' rights to privacy and to be informed have been impacted by the web. Our work highlights the key role in study of consumer protection issues played by longitudinal analyses and longitudinal data collection---data collected over repeated, time-spaced passes. We investigate consumer protection issues on the web through longitudinal studies in two landscapes: website privacy policies and reviews on Yelp. We approach both problems by collecting data with automated, repeated visits to the websites of interest to collect large scale datasets. In our study of privacy policies, we aggregate Internet Archive's crawls to perform longitudinal collection, and in our online reviews study, we crawl the data ourselves. We collected 1M privacy policies spanning 22 years and 12.5M reviews over 11 months. We used our data to study the evolution of privacy policies raising concerns with rights to privacy and information. We find gaps in disclosure of privacy-related practices. We show declining readability over the long term, doubling in length and becoming more complex. We show disparities in website-reported and independently-observed tracking. In our study of online reviews we raise concerns with the right to be informed. We present the first study of "reclassification," wherein a platform changes its filtering decision for a review. We find that reviews routinely move between Yelp's two main classifier classes ("Recommended" and "Not Recommended"), up to five reclassifications on a single review. We identify demographic disparities in review prevalence and filtering decisions. By showing phenomena that cannot be studied without longitudinal data collection and analysis, we emphasize the importance of longitudinal study for consumer protection issues online. Our work helps lay the groundwork for future work on these issues through our software and data releases, easing the pathway for future researchers.
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Amos_princeton_0181D_14028.pdf843.25 kBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.