Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp012801pg37z
Title: Similarity Search with Multimodal Data
Authors: Wang, Zhe
Advisors: Li, Kai
Charikar, Moses
Contributors: Computer Science Department
Keywords: High Dimensional Data
Large Dataset
Multimodal System
Search System
Similarity Search
Subjects: Computer science
Computer engineering
Issue Date: 2012
Publisher: Princeton, NJ : Princeton University
Abstract: Similarity search systems are designed to help people to organize multimedia non-text data and find valuable information. The multimedia data intrinsically has multiple modalities (e.g., visual and audio features from video clips) which can be exploited to construct better search systems. Traditionally, various integration techniques have been used to aggregate multiple modalities. However, such algorithms do not scale well for large datasets. As the multimedia data grows, it is a challenge to build a search system to handle large-scale multimodal data efficiently and provide users with information they need. The goal of this dissertation is to study how to effectively combine multiple modalities to implement similarity search systems for large datasets. I have carried out my study through three similarity search systems each designed for different application. Each system combines multiple modalities to help users find desired information quickly. With VFerret system, I studied how to combine visual features with audio features for effective personal video search. With Image Spam Detection System, I explored several aggregation methods to integrate multiple image spam filters to detect image spams. With my Product Navigation System, I studied how to combine text search with image similarity search to help user find desired products. This thesis has also studied a rank-based model which helps system designers to construct more efficient large-scale multimodal similarity search systems. Although the general solution to using multimodal data in a similarity search system is still unknown, this dissertation shows that it is possible to substantially improve search accuracy and efficiency by leveraging domain specific knowledge of multimodal data. The VFerret system improves search accuracy from an average precision of 0.66 to 0.79 by combining visual and audio features. The Image Spam Detection System significantly lowers the false positive rate from a previous result of 1% to 0.001% while maintaining comparable detection rates by combining multiple image filters intelligently. My Product Navigation System reduces number of user clicks by 60% compared to traditional systems through a new method of combining text search with image similarity search. These results support further adoption and study of multimodal data in similarity search system designs.
URI: http://arks.princeton.edu/ark:/88435/dsp012801pg37z
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Wang_princeton_0181D_10089.pdf4.61 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.