Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp015d86p330d
 Title: Multi-Source Text Generation and Beyond using Reinforcement Learning Authors: Cho, Woon Sang Advisors: Wang, Mengdi Contributors: Operations Research and Financial Engineering Department Subjects: Artificial intelligence Issue Date: 2021 Publisher: Princeton, NJ : Princeton University Abstract: Generating texts that resemble human-written natural texts has long been a research challenge. Given some initial text as a context to generate what should come next, many of the current text generation systems generate a continuation text that often times exhibits a loose connection to the preceding text, resulting in a lack of local connectivity between adjacent sentences, let alone coherence as a whole. Few attempted to explicitly improve text generation systems from the perspectives of coherence and cohesion. Therefore, a mechanism to reinforce the soundness and seamless connection of the combined text, that is, the initial human-written context and the system-generated text put together, is desirable. In this thesis, we propose two neural discriminators that provide coherence and cohesion reward signals to a neural language model. Next, we address another interesting challenge motivated from the following observation: ambiguous user queries in search engines result in the retrieval of documents that often span multiple topics. One potential solution is for the search engine to generate multiple refined or clarification queries for the user whom initially entered the ambiguous query, such that each of the multiple refined queries relates to a subset of the documents spanning the same topic. A preliminary step towards this goal is to generate a question that captures common concepts of multiple documents. To this end, we propose a new task of generating a common question from multiple documents and present a simple variant of an existing multi-source encoder-decoder framework, Multi-Source Question Generator (MSQG). However, this simple class of models uses only the targeted (positive'') multi-document set, and may generate generic questions that cover a larger scope than delineated by the document set. To address this challenge, we introduce the contrastive learning strategy where given positive'' and negative'' sets of documents, we generate a question that is closely related to the positive'' set but is far away from the negative'' set. We also propose an effective auxiliary objective, Set-induced Contrastive Regularization (SCR) to develop a Multi-Source Coordinated Question Generator (MSCQG). URI: http://arks.princeton.edu/ark:/88435/dsp015d86p330d Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu Type of Material: Academic dissertations (Ph.D.) Language: en Appears in Collections: Operations Research and Financial Engineering