Please use this identifier to cite or link to this item:
|Title:||Extracting Information from High-Dimensional Data: Probabilistic Modeling, Inference and Evaluation|
|Advisors:||Daubechies, Ingrid C.|
Blei, David M.
|Contributors:||Electrical Engineering Department|
|Publisher:||Princeton, NJ : Princeton University|
|Abstract:||Data science is an emerging field at the interface of computer science, statistics, mathematics and signal processing. This field is undergoing an explosive growth, mainly due to the widespread use of tools, such as the internet and mobile devices, that lead to the massive accumulation of data from different sources. The sheer size of these data sets requires large scale computational (rather than human-powered) data analysis and decision making, and advances in computing resources are a driving force in this growth. However, the scale and high dimensionality of data are such that, powerful present-day computing resources can only partially address the complexity of the problems -- they need to be paired with advanced techniques and algorithms. A typical data analysis project consists of several stages: initial exploratory analysis, model building, derivation of inference, visualization, and evaluation. In modern data science, one important problem of the model-building phase is how to incorporate data-specific properties to the models. Early machine learning techniques were designed to work on generic data sets, using parameters specified a priori. However, as the diversity and complexity of the data sets grew, more advanced approaches were needed, tailored to the particular properties of the type of application under study. Such tailoring can take many different forms. For instance, it may be necessary to learn the model parameters from the data (instead of specifying them from the start); one can incorporate prior information (such as sparsity with respect to special representations, which themselves have to be learned); it may be beneficial to make use of relational structure within the data, which can be available in many guises: segmented image patches, citation networks of documents, social networks of friends. In this thesis, we shall visit all these approaches, each time within a probabilistic model built so as to incorporate prior information. More precisely, we shall derive, in a variety of settings, and for different applications, efficient posterior inference algorithms handling large data sets, and use side information to derive superior inference techniques. We demonstrate the efficiency and accuracy of those models and algorithms in the different applications, on both real and synthetic data sets. We evaluate the quality of the results, with both quantitative and human evaluation experiments. In the first part of the thesis the general framework is that of sparsity: we assume the data have a sparse representation; the application on which we focus is image super-resolution, in which one seeks to "up-scale images", i.e. "reconstruct" finer detail in an image than given in the data. Image super-resolution has been tackled successfully via sparse coding but not, so far, by Bayesian nonparametric methods (BNM). In other contexts, BNMs were shown to be powerful because they infer parameters that otherwise have to be assigned a priori. We build here the tools enabling such a BNM for the super-resolution of images. We start with building a sparse nonparametric factor analysis model for image super-resolution, more precisely, a model with a beta-Bernoulli process to learn the number of dictionary elements from the data. We test the results on both benchmark and natural images, comparing with the models in the literature. Then, we perform large-scale human evaluation experiments to explicitly assess the visual quality of the results. In a first implementation, we use Gibbs sampling, operating on the data in batch mode, and assess its performance. However, for large-scale data, such a Gibbs sampling approach is typically not feasible. To circumvent this, we develop an online variational Bayes (VB) algorithm that can deal with larger-scale data in a fraction of the time needed by traditional inference. In the second part of the thesis we consider data sets with rich side information. We study 2 different frameworks that have such side information: relational information and group information. To handle relational information, we build a relational factor analysis (rFA) model which incorporates this into the dictionary learning. We show that the use of relational information (e.g. spatial location), helps learning higher quality dictionaries and improves the recommendation systems in a social network and the image analysis algorithms (e.g. image inpainting). To handle group information, we propose a multi-task learning framework for image super-resolution problem using a hierarchical beta-process as a prior to dictionary assignments. In this model, we study grouped data and we build a model incorporating the group information. We show that by incorporating group information in this way the algorithm avoids erroneous selection of dictionary elements. Finally, in the third part of the thesis, we study latent sequential information between observations. We use this information to build a novel dynamic programming algorithm for sequential models. Hidden Markov models (HMMs) and conditional random fields (CRFs) are two popular techniques for modeling sequential data. Inference algorithms designed over CRFs and HMMs allow estimation of the state sequence, given the observations. In several applications, the end goal is not the estimation of the state sequence, but rather the estimation of the value of some function of the state sequence. In such scenarios, estimating the state sequence by conventional inference techniques, followed by computing the functional mapping from this estimate, is not necessarily optimal; it may be more efficient to directly infer the final outcome from the observations. In particular, we consider the specific instantiation of the problem where the goal is to find the state trajectories without exact transition points; we derive a novel polynomial time inference algorithm that outperforms vanilla inference techniques. We show that this particular problem arises commonly in many disparate applications and present the results for experiments on three different applications: (1) Toy robot tracking; (2) Single stroke character recognition; (3) Handwritten word recognition.|
|Alternate format:||The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog|
|Type of Material:||Academic dissertations (Ph.D.)|
|Appears in Collections:||Electrical Engineering|
Files in This Item:
|Polatkan_princeton_0181D_10374.pdf||12.59 MB||Adobe PDF||View/Download|
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.