Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01fq977x51j
Title: It’s a Dupe: High-End Cosmetics for Less with Product NamedEntity Recognition of Scraped Web Pages
Authors: Huang, Betty
Advisors: Fish, Robert
Department: Computer Science
Class Year: 2018
Abstract: Many high-end cosmetics have drugstore duplicates—“dupes” for short—that achieve the sameeffect as the original at a much lower price point. There are many blog posts, articles, and webforums that recommend dupes for various products. However, it is tedious to search through theweb to find all this information and cross-reference it with product reviews to come to a purchasingdecision. We present a novel dupe-calculation method by using Linear-Chain Conditional RandomFields (CRF) to perform Product Named Entity Recognition (PNER) of scraped Google searchresults to extract dupe product names. We build a web and mobile front-end to display the data. Theresults and performance proved better than existing competitors, and show this method has muchpotential in exploiting this niche market.
URI: http://arks.princeton.edu/ark:/88435/dsp01fq977x51j
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File Description SizeFormat 
HUANG-BETTY-THESIS.pdf1.83 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.