Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01zs25xc84q
Title: Long-Context Language Modeling with Parallel Context Encoding
Authors: Yen, Howard
Advisors: Chen, Danqi
Department: Computer Science
Class Year: 2024
Publisher: Princeton, NJ : Princeton University
Abstract: Extending large language models (LLMs) to process longer inputs is crucial for numerous applications. However, the considerable computational cost of transformers, coupled with limited generalization of positional encoding, restricts the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE adopts a small encoder to process long inputs chunk by chunk and enables the frozen decoder to leverage additional contexts via cross-attention. CEPE is efficient, generalizable, and versatile: trained with 8K-token documents, CEPE extends the context window of LLAMA-2 to 128K tokens, offering 10x the throughput with only 1/6 of the memory. CEPE yields strong performance on language modeling and in-context learning. CEPE also excels in retrieval-augmented applications, while existing long-context models degenerate with retrieved contexts. We further introduce a CEPE variant that can extend the context window of instruction-tuned models with only unlabeled data, and showcase its effectiveness on LLAMA-2-CHAT, leading to a strong instruction-following model that can leverage very long context on downstream tasks.
URI: http://arks.princeton.edu/ark:/88435/dsp01zs25xc84q
Type of Material: Academic dissertations (M.S.E.)
Language: en
Appears in Collections:Computer Science, 2023

Files in This Item:
File Description SizeFormat 
Yen_princeton_0181G_15045.pdf600.6 kBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.