Large Scale Visual Recognition

Deng, Jia

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp014b29b601s

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Li, Fei-Fei	en_US
dc.contributor.advisor	Li, Kai	en_US
dc.contributor.author	Deng, Jia	en_US
dc.contributor.other	Computer Science Department	en_US
dc.date.accessioned	2012-08-01T19:35:31Z	-
dc.date.available	2012-08-01T19:35:31Z	-
dc.date.issued	2012	en_US
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp014b29b601s	-
dc.description.abstract	Visual recognition remains one of the grand goals of artificial intelligence research. One major challenge is endowing machines with human ability to recognize tens of thousands of categories. Moving beyond previous work that is mostly focused on hundreds of categories, we make progress toward human scale visual recognition. Specifically, our contributions are as follows: First, we have constructed "ImageNet," a large scale image ontology. The Fall 2011 version consists of 22 thousand categories and 14 million images; it depicts each category by an average of 650 images collected from the Internet and verified by multiple humans. To the best of our knowledge this is currently the largest human-verified dataset in terms of both the number of categories and the number of images. Given the large amount of human effort required, the traditional approach to dataset collection, involving in-house annotation by a small number of human subjects, becomes infeasible. In this dissertation we describe how ImageNet has been built through quality controlled, cost effective, large scale online crowdsourcing. Next, we use ImageNet to conduct the first benchmarking study of state of the art recognition algorithms at the human scale. By experimenting on 10 thousand categories, we discover that the previous state of the art performance is still low (6.4%). We further observe that the confusion among categories is hierarchically structured at large scale, a key insight that leads to our subsequent contributions. Third, we study how to efficiently classify tens of thousands of categories by exploiting the structure of visual confusion among categories. We propose a novel learning technique that scales logarithmically with the number of classes in both training and testing, improving both accuracy and efficiency of the previous state of the art while reducing training time by 31 fold on 10 thousand classes. Fourth, we consider the problem of retrieving semantically similar images from a large database, a problem closely related to classification. We propose an indexing approach that exploits the hierarchical structure between categories. Experiments demonstrate that our approach is more efficient, scalable, and accurate than previous work. In particular, our indexing technique achieves close to 90% of the accuracy of brute force with a 1,000 times speedup. Finally, further exploiting the hierarchy, we show how to select the appropriate level of specificity to guarantee an arbitrary classification accuracy. We propose an algorithm that is provably optimal under mild conditions and demonstrate its effectiveness on classifying 10 thousand classes. Experiments show that our algorithm guarantees a 90% accuracy while giving informative answers 83% of the time. This holds promise toward a practical large scale recognition system.	en_US
dc.language.iso	en	en_US
dc.publisher	Princeton, NJ : Princeton University	en_US
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>	en_US
dc.subject	computer vision	en_US
dc.subject	large dataset	en_US
dc.subject	recognition	en_US
dc.subject.classification	Computer science	en_US
dc.title	Large Scale Visual Recognition	en_US
dc.type	Academic dissertations (Ph.D.)	en_US
pu.projectgrantnumber	690-2143	en_US
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Deng_princeton_0181D_10257.pdf		20.04 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse