Curriculum Vitae — David Xia

Research Interests

Theoretical ML, reinforcement learning, online learning, MDP function approximation, sample complexity.

Education

University of Illinois Urbana-Champaign Aug 2023 – May 2027

Champaign, Illinois · GPA 3.99 / 4.00

B.S. in Computer Science
B.S. in Liberal Arts & Sciences, Mathematics (Data Optimization concentration)
B.S. in Liberal Arts & Sciences, Statistics

Papers

2025

RSK linear operators and the Vershik–Kerov–Logan–Shepp curve

Duy Phan, David Xia

TO APPEAR Electronic Journal of Combinatorics

arXiv

2025

EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations

Jung-Woo Chang, Ke Sun, David Xia, Xinyu Zhang, Farinaz Koushanfar

PUBLISHED IEEE Symposium on Security and Privacy 2025

arXiv

Workshop Papers, Manuscripts & Notes

2026

Frozen Policy Iteration for MDPs with Stochastic Transitions under Linear Q^π Realizability

David Xia, Ruizhong Qiu, Hanghang Tong

IN PREP Manuscript in preparation

PDF

2026

Predictability in Major League Sports: Betting Odds versus Mathematical Models

David Xia, AJ Hildebrand

SUBMITTED Mathematics and Sports (MAS) Journal

PDF

2026

Revitalizing Local Democracy: A Human-Centered Audit of LLMs in City Council Journalism

David Xia, Chris Maury

WORKSHOP HEAL Workshop at CHI 2026

PDF

Research Experience

Northwestern University May 2026 – present

Research Intern · advised by Prof. Zhaoran Wang

Developed a closed-loop LLM meta-orchestrator agent for aircraft wing inverse design, where the agent diagnoses its own surrogate failures across iterative versions and emits corrective source code for the next optimization layer.
Identified and systematically addressed seven structural failure modes of surrogate-based optimization — including surrogate exploitation, Gaussian trust collapse, family-specific exploitation, and single-objective specification gaming — building a unified taxonomy and corresponding defenses.
Applied reinforcement-learning-style reasoning to the outer loop: the LLM agent classifies failure modes and upgrades its decision layer, achieving relative robustness gains validated against public RANS datasets (AirfRANS, SuperWing, ONERA CRM).

iDEA-iSAIL Lab Feb 2026 – present

Research Intern · advised by Prof. Hanghang Tong

Conducted research in theoretical reinforcement learning, studying sample and computation efficient algorithms for online learning in finite-horizon Markov Decision Processes under function approximation and stochastic dynamics.
Used concentrability to develop the first sample and computation efficient algorithm for online reinforcement learning in stochastic-transition MDPs under the linear Q^π realizability assumption, resolving an open problem left by prior work restricted to deterministic dynamics.
Established a PAC sample complexity guarantee of Õ(d²H⁷C^*/ε³) by combining techniques from offline and online reinforcement learning theory, dynamic programming, concentration analysis, and linear function approximation.
Manuscript in preparation: Frozen Policy Iteration for MDPs with Stochastic Transitions under Linear Q^π Realizability.

Illinois Combinatorics Lab for Undergraduate Experience (ICLUE) Aug 2024 – present

Research Intern · advised by Prof. Alexander Yong

Proved a new asymptotic result that the probability that the Schensted insertion algorithm for a uniformly random permutation in S_n exhibits a special "bumping" interaction converges to 1 as n approaches infinity.
Derived the result using rigorous probability bounds on the RSK linear operator by leveraging the Vershik-Kerov-Logan-Shepp limit shape curve as the central analytic tool.
Developed proofs spanning real analysis (uniform convergence, limit arguments), probability theory (concentration inequalities, convergence in probability), and asymptotic analysis (bounding combinatorial quantities in the large-n regime).
Paper to appear in the Electronic Journal of Combinatorics.

Illinois Mathematics Lab Jan 2024 – May 2026

Research Intern · advised by Prof. AJ Hildebrand

Researched the data-science perspective of sports matches, analyzing the predictability of outcomes by comparing prediction accuracies across polls, the betting market, Elo ratings, and mathematical models like Bradley-Terry.
Presented findings at the Rose-Hulman Undergraduate Math Conference 2024, UIUC Undergraduate Research Symposium 2024, and Joint Mathematics Meetings 2025.
Paper in submission to the Mathematics and Sports (MAS) Journal.

Carnegie Mellon University, HCII May 2025 – Feb 2026

Research Intern · advised by Prof. Jeff Bigham

Designed and evaluated an automated journalism pipeline using SOTA LLMs to replicate end-to-end editorial workflows, from transcript segmentation to headline generation and topic prioritization.
Orchestrated a large-scale crowd-sourced study and demonstrated that LLM-based headline quality and topic prioritization can meet and exceed professional standards.
Paper in the CHI 2026 HEAL workshop.

National University of Singapore, SERIUS REU May 2024 – Aug 2024

Research Intern · advised by Prof. Kelvin Fong Xuanyao

Designed the GARDEN (Generalized Anomaly Recognition and Detection for Enhanced Nurturing) pipeline for early disease detection in farm plants, combining segmentation, generative, and classification models end-to-end.
Built a leaf segmentation model to isolate regions of interest from complex backgrounds, enabling downstream anomaly detection to focus on plant tissue only.
Implemented a conditional GAN (cGAN) to synthesize realistic diseased and healthy leaf images, augmenting training data and improving classifier generalization.
Achieved robust disease classification by ensembling deep learning and traditional ML models, exploiting complementary strengths to improve detection reliability on real farm images.

University of California San Diego Jun 2023 – Dec 2024

Research Intern · advised by Prof. Xinyu Zhang

Developed EveGuard, a software-based defense framework to protect voice privacy from vibrometry-based side channels, using a perturbation generator model (PGM) to suppress sensor-based eavesdropping while preserving high audio quality.
Implemented Eve-GAN, a novel domain-translation task for inferring eavesdropped signals, enabling end-to-end training of the PGM and using few-shot learning to reduce data-collection overhead.
Achieved a protection rate of over 97% against audio classifiers, hindering eavesdropped reconstruction.
Paper published in the IEEE Symposium on Security and Privacy 2025.

Grants & Awards

2025CMU HCII REU — NSF 2349558
2025ICLUE RTG — NSF 1937241

Relevant Coursework

CS: Statistical Reinforcement Learning, Machine Learning, NLP, Deep Learning, Algorithms, Numerical Methods, Numerical Analysis

Math: Honors Real Analysis, Graph Theory, Combinatorics, Abstract Linear Algebra, Abstract Algebra I–II, Algebraic Combinatorics, Linear Programming

Stats: Stochastic Processes, Applied Bayesian Analysis, Survival Analysis

Languages

English: Native

Mandarin Chinese: Fluent, HSK6