Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp013f462884t
Title: Designing Efficient Clinical Randomized Controlled Trials with Limited Data using Artificial Intelligence
Authors: Lala, Sayeri
Advisors: Jha, Niraj K
Contributors: Electrical and Computer Engineering Department
Keywords: artificial intelligence
clinical randomized controlled trials
efficiency
Subjects: Electrical engineering
Public health
Artificial intelligence
Issue Date: 2024
Publisher: Princeton, NJ : Princeton University
Abstract: Approximately 10K diseases affect the population worldwide, yet only a small fraction of these diseases has some form of treatment, resulting in poor quality of life and shorter lifespans for tens to hundreds of millions of people across the globe. In response, biopharmaceutical companies have made substantial investments into research and development for drugs over the past decade, with total expenditures ranging between US$10B to US$100B per year; however, challenges with drug development greatly impede success, with only approximately 10% of developed drugs entering the market. While challenges emerge during all stages of drug development, which include discovery in the lab, pre-clinical testing, and clinical testing, addressing challenges with clinical testing, specifically, Phase-3 Randomized Controlled Trials (RCTs), remains paramount, given their critical role in obtaining market approval, combined with their high failure rates (33-44%) and high failure costs (US$ 0.5B-1.2B per drug) due to large sample size requirements. In this thesis, we make significant progress towards improving the efficiency of clinical RCTs by increasing their success rates and reducing their expenses with artificial intelligence using no or limited amounts of data collected before the RCT. First, we introduce SECRETS, a framework that reduces the sample size required for an RCT to demonstrate treatment effectiveness with high statistical accuracy. Specifically, SECRETS simulates the cross-over trial, an RCT design that measures Individual Treatment Effects (ITEs) to reduce intersubject variance and thereby improve sample efficiency but that is otherwise impractical for many settings. SECRETS uses a state-of-the-art counterfactual estimation algorithm to estimate the ITEs from only the collected RCT data and then applies a novel suitable hypothesis testing strategy. Evaluations across several real-world RCT datasets demonstrate the effectiveness of SECRETS, which reduces sample size requirements by 25-76% (tens to thousands of subjects). SECRETS is useful for regaining power under settings where it is difficult to meet the required sample size due to lagging recruitment or large sample size requirements. To further increase trial success rates, we introduce TAD-SIE, a framework for estimating the sample size required to establish the effectiveness of a treatment in the presence of poor sample size estimates stemming from insufficient prior data. TAD-SIE implements a novel trend-adaptive design tailored to SECRETS that adjusts the sample size estimate based on accrued RCT data while leveraging a sample-efficient hypothesis testing strategy to increase the likelihood of reaching the target statistical operating point. Specifically, TAD-SIE increases the sample size according to interim estimates of treatment effect parameters under SECRETS while controlling for type-1 error with futility stopping. After sample size adjustments, it performs hypothesis testing with SECRETS. Given its iterative nature, TAD-SIE accommodates different use cases, yielding solutions that trade off sample and time efficiency. On a sample RCT dataset, TAD-SIE can appropriately power an RCT at typical operating points (e.g., 80% or 90% power and 5% significance level) under either sample- or time-efficient modes, in contrast to prior baseline approaches. Finally, we present METRIK, a framework that reduces the total number of measurements that need to be collected per subject throughout the trial, thereby lowering trial expenses. To achieve this, METRIK learns a Planned Missing Design (PMD) and an associated imputation model using prior data collected from a small internal pilot study to identify correlations over time and across metrics. Specifically, METRIK models the PMD-imputer pair using a differentiable weight mask prepended to a transformer-based imputation framework and then generates a diverse set of PMD-imputer pairs by sweeping over relevant hyperparameters and choosing candidate pairs that have higher sampling efficiency and imputation performance compared to baseline (random) designs. METRIK chooses the final PMD-imputer pair based on the designer’s objectives, that is, whether to maximize sampling efficiency or maximize imputation performance. Across several RCT datasets, METRIK generates PMD-imputer pairs that substantially boost efficiency and/or imputation performance over random baseline strategies (e.g., efficiency increases by a median of 38%, interquartile range: [30%, 44%]).
URI: http://arks.princeton.edu/ark:/88435/dsp013f462884t
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Electrical Engineering

Files in This Item:
File Description SizeFormat 
Lala_princeton_0181D_15264.pdf3.8 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.