Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01zs25xc84q
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorChen, Danqi
dc.contributor.authorYen, Howard
dc.contributor.otherComputer Science Department
dc.date.accessioned2024-08-08T18:23:28Z-
dc.date.available2024-08-08T18:23:28Z-
dc.date.created2024-01-01
dc.date.issued2024
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01zs25xc84q-
dc.description.abstractExtending large language models (LLMs) to process longer inputs is crucial for numerous applications. However, the considerable computational cost of transformers, coupled with limited generalization of positional encoding, restricts the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE adopts a small encoder to process long inputs chunk by chunk and enables the frozen decoder to leverage additional contexts via cross-attention. CEPE is efficient, generalizable, and versatile: trained with 8K-token documents, CEPE extends the context window of LLAMA-2 to 128K tokens, offering 10x the throughput with only 1/6 of the memory. CEPE yields strong performance on language modeling and in-context learning. CEPE also excels in retrieval-augmented applications, while existing long-context models degenerate with retrieved contexts. We further introduce a CEPE variant that can extend the context window of instruction-tuned models with only unlabeled data, and showcase its effectiveness on LLAMA-2-CHAT, leading to a strong instruction-following model that can leverage very long context on downstream tasks.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherPrinceton, NJ : Princeton University
dc.subjectInformation retrieval
dc.subjectLarge language models
dc.subjectLong-context language models
dc.subject.classificationComputer science
dc.subject.classificationArtificial intelligence
dc.titleLong-Context Language Modeling with Parallel Context Encoding
dc.typeAcademic dissertations (M.S.E.)
pu.date.classyear2024
pu.departmentComputer Science
Appears in Collections:Computer Science, 2023

Files in This Item:
File Description SizeFormat 
Yen_princeton_0181G_15045.pdf600.6 kBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.