Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01m326m501q
Title: Addressing Language Barriers in Healthcare: Building Chinese-English Neural Machine Translation Models for Doctor-Patient Dialogues
Authors: Teng, Ashley
Advisors: Chen, Danqi
Department: Computer Science
Class Year: 2023
Abstract: The accuracy of neural machine translation (NMT) for doctor-patient medical dialogues is crucial in real-life scenarios where professional medical advice is provided. To effectively evaluate the performance of strong NMT models on medical datasets, we choose to study high-resource languages Mandarin Chinese and English. In response to a lack of sufficient medical conversational data, we introduce a new medical dialogue dataset: Medical Score (MScore). We also conduct a study on the translation accuracy of Google Translate for MScore in which certified interpreters report 93.6% accuracy. To analyze the performance of strong NMT models on new and existing medical datasets, we select the best among high-performing NMT models: supervised system OPUS-MT which is based on Marian-NMT, and conduct experiments that train the model on various combinations of medical datasets containing articles and dialogues. We evaluate the resulting OPUS-MT models on target domain data, CSMS\(_{DIALOG}\) (combines MScore with another medical dialogue set) with the highest-performing model achieving a BLEU score of 46.9, surpassing Google Translate's supervised system. The experiments show that training on medical datasets that contain noisy conversational sentences is more effective than clean biomedical articles for translating medical dialogues. We also observe that Chinese-English models have difficulty handling Chinese medical terms with multiple meanings and translating Chinese with missing pronouns to English with the correct pronouns. With the recent release of ChatGPT, a system trained with both supervised and unsupervised learning, we also study ChatGPT's conversational strength and limitations, with an emphasis on its ability to translate Chinese medical dialogues with dropped pronouns.
URI: http://arks.princeton.edu/ark:/88435/dsp01m326m501q
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2024

Files in This Item:
File Description SizeFormat 
TENG-ASHLEY-THESIS.pdf7.36 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.