Curriculum Vitae

Theoretical machine learning & reinforcement learning · University of Illinois Urbana-Champaign

Research Interests

Theoretical ML, reinforcement learning, online learning, MDP function approximation, sample complexity.

Education

University of Illinois Urbana-Champaign
Champaign, Illinois · GPA 3.99 / 4.00
  • B.S. in Computer Science
  • B.S. in Liberal Arts & Sciences, Mathematics (Data Optimization concentration)
  • B.S. in Liberal Arts & Sciences, Statistics

Papers

2025
RSK linear operators and the Vershik–Kerov–Logan–Shepp curve
Duy Phan, David Xia
TO APPEAR Electronic Journal of Combinatorics
2025
EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations
Jung-Woo Chang, Ke Sun, David Xia, Xinyu Zhang, Farinaz Koushanfar
PUBLISHED IEEE Symposium on Security and Privacy 2025

Workshop Papers, Manuscripts & Notes

2026
Frozen Policy Iteration for MDPs with Stochastic Transitions under Linear Qπ Realizability
David Xia, Ruizhong Qiu, Hanghang Tong
IN PREP Manuscript in preparation
2026
Predictability in Major League Sports: Betting Odds versus Mathematical Models
David Xia, AJ Hildebrand
SUBMITTED Mathematics and Sports (MAS) Journal
2026
Revitalizing Local Democracy: A Human-Centered Audit of LLMs in City Council Journalism
David Xia, Chris Maury
WORKSHOP HEAL Workshop at CHI 2026

Research Experience

Northwestern University
Research Intern · advised by Prof. Zhaoran Wang
  • Developed a closed-loop LLM meta-orchestrator agent for aircraft wing inverse design, where the agent diagnoses its own surrogate failures across iterative versions and emits corrective source code for the next optimization layer.
  • Identified and systematically addressed seven structural failure modes of surrogate-based optimization — including surrogate exploitation, Gaussian trust collapse, family-specific exploitation, and single-objective specification gaming — building a unified taxonomy and corresponding defenses.
  • Applied reinforcement-learning-style reasoning to the outer loop: the LLM agent classifies failure modes and upgrades its decision layer, achieving relative robustness gains validated against public RANS datasets (AirfRANS, SuperWing, ONERA CRM).
iDEA-iSAIL Lab
Research Intern · advised by Prof. Hanghang Tong
  • Conducted research in theoretical reinforcement learning, studying sample and computation efficient algorithms for online learning in finite-horizon Markov Decision Processes under function approximation and stochastic dynamics.
  • Used concentrability to develop the first sample and computation efficient algorithm for online reinforcement learning in stochastic-transition MDPs under the linear Qπ realizability assumption, resolving an open problem left by prior work restricted to deterministic dynamics.
  • Established a PAC sample complexity guarantee of Õ(d2H7C*3) by combining techniques from offline and online reinforcement learning theory, dynamic programming, concentration analysis, and linear function approximation.
  • Manuscript in preparation: Frozen Policy Iteration for MDPs with Stochastic Transitions under Linear Qπ Realizability.
Illinois Combinatorics Lab for Undergraduate Experience (ICLUE)
Research Intern · advised by Prof. Alexander Yong
  • Proved a new asymptotic result that the probability that the Schensted insertion algorithm for a uniformly random permutation in Sn exhibits a special "bumping" interaction converges to 1 as n approaches infinity.
  • Derived the result using rigorous probability bounds on the RSK linear operator by leveraging the Vershik-Kerov-Logan-Shepp limit shape curve as the central analytic tool.
  • Developed proofs spanning real analysis (uniform convergence, limit arguments), probability theory (concentration inequalities, convergence in probability), and asymptotic analysis (bounding combinatorial quantities in the large-n regime).
  • Paper to appear in the Electronic Journal of Combinatorics.
Illinois Mathematics Lab
Research Intern · advised by Prof. AJ Hildebrand
  • Researched the data-science perspective of sports matches, analyzing the predictability of outcomes by comparing prediction accuracies across polls, the betting market, Elo ratings, and mathematical models like Bradley-Terry.
  • Presented findings at the Rose-Hulman Undergraduate Math Conference 2024, UIUC Undergraduate Research Symposium 2024, and Joint Mathematics Meetings 2025.
  • Paper in submission to the Mathematics and Sports (MAS) Journal.
Carnegie Mellon University, HCII
Research Intern · advised by Prof. Jeff Bigham
  • Designed and evaluated an automated journalism pipeline using SOTA LLMs to replicate end-to-end editorial workflows, from transcript segmentation to headline generation and topic prioritization.
  • Orchestrated a large-scale crowd-sourced study and demonstrated that LLM-based headline quality and topic prioritization can meet and exceed professional standards.
  • Paper in the CHI 2026 HEAL workshop.
National University of Singapore, SERIUS REU
Research Intern · advised by Prof. Kelvin Fong Xuanyao
  • Designed the GARDEN (Generalized Anomaly Recognition and Detection for Enhanced Nurturing) pipeline for early disease detection in farm plants, combining segmentation, generative, and classification models end-to-end.
  • Built a leaf segmentation model to isolate regions of interest from complex backgrounds, enabling downstream anomaly detection to focus on plant tissue only.
  • Implemented a conditional GAN (cGAN) to synthesize realistic diseased and healthy leaf images, augmenting training data and improving classifier generalization.
  • Achieved robust disease classification by ensembling deep learning and traditional ML models, exploiting complementary strengths to improve detection reliability on real farm images.
University of California San Diego
Research Intern · advised by Prof. Xinyu Zhang
  • Developed EveGuard, a software-based defense framework to protect voice privacy from vibrometry-based side channels, using a perturbation generator model (PGM) to suppress sensor-based eavesdropping while preserving high audio quality.
  • Implemented Eve-GAN, a novel domain-translation task for inferring eavesdropped signals, enabling end-to-end training of the PGM and using few-shot learning to reduce data-collection overhead.
  • Achieved a protection rate of over 97% against audio classifiers, hindering eavesdropped reconstruction.
  • Paper published in the IEEE Symposium on Security and Privacy 2025.

Grants & Awards

Relevant Coursework

CS: Statistical Reinforcement Learning, Machine Learning, NLP, Deep Learning, Algorithms, Numerical Methods, Numerical Analysis

Math: Honors Real Analysis, Graph Theory, Combinatorics, Abstract Linear Algebra, Abstract Algebra I–II, Algebraic Combinatorics, Linear Programming

Stats: Stochastic Processes, Applied Bayesian Analysis, Survival Analysis

Languages

English: Native

Mandarin Chinese: Fluent, HSK6