David Xia

Curriculum Vitae

PDF Version

Research Interests

Theoretical ML, reinforcement learning, online learning, MDP function approximation, sample complexity


Education


Papers


Workshop Papers, Manuscripts, Notes


Research Experience

iDEA-iSAIL Lab

Research Intern, Feb 2026 - present. Advised by Prof. Hanghang Tong.
  • Conducted research in theoretical reinforcement learning, studying sample and computation efficient algorithms for online learning in finite-horizon Markov Decision Processes under function approximation and stochastic dynamics.
  • Used concentrability to develop the first sample and computation efficient algorithm for online reinforcement learning in stochastic-transition MDPs under the linear Qπ realizability assumption, resolving an open problem left by prior work restricted to deterministic dynamics.
  • Established a PAC sample complexity guarantee of Õ(d2H7C*3) by combining techniques from both offline and online reinforcement learning theory, dynamic programming, concentration analysis, and linear function approximation.
  • Manuscript in preparation: Frozen Policy Iteration for MDPs with Stochastic Transitions under Linear Qπ Realizability.

Illinois Combinatorics Lab for Undergraduate Experience (ICLUE)

Research Intern, Aug 2024 - present. Advised by Prof. Alexander Yong.
  • Proved a new asymptotic result that the probability that the Schensted insertion algorithm for a uniformly random permutation in Sn exhibits a special "bumping" interaction converges to 1 as n approaches infinity.
  • Derived the result using rigorous probability bounds on the RSK linear operator by leveraging the Vershik-Kerov-Logan-Shepp limit shape curve as the central analytic tool.
  • Developed proofs requiring techniques spanning real analysis (uniform convergence, limit arguments), probability theory (concentration inequalities, convergence in probability), and asymptotic analysis (bounding combinatorial quantities in the large-n regime).
  • Paper in submission to the Electronic Journal of Combinatorics.

University of California San Diego

Research Intern, Jun 2023 - Dec 2024. Advised by Prof. Xinyu Zhang.
  • Developed EveGuard, a software-based defense framework to protect voice privacy from vibrometry-based side channels using adversarial audio by using a perturbation generator model (PGM) to effectively suppress sensor-based eavesdropping while preserving high audio quality.
  • Implemented Eve-GAN, a novel domain translation task for inferring eavesdropped signals, enabling end-to-end training of PGM, utilizing few-shot learning techniques to reduce data collection overhead.
  • Achieved a protection rate of over 97% against audio classifiers, hindering eavesdropped reconstruction.
  • Paper published in the IEEE Symposium on Security and Privacy 2025.

Illinois Mathematics Lab (formerly Illinois Geometry Lab)

Research Intern, Jan 2024 - present. Advised by Prof. AJ Hildebrand.
  • Researched the data science perspective of sports matches, analyzing the predictability of outcomes by comparing prediction accuracies across polls, the betting market, Elo ratings, and mathematical models like Bradley-Terry.
  • Presented findings at Rose-Hulman Undergraduate Math Conference 2024, UIUC Undergraduate Research Symposium 2024, and Joint Mathematics Meeting 2025.
  • Paper in submission to the Mathematics and Sports (MAS) Journal.

Carnegie Mellon University Human-Computer Interaction Institute (HCII)

Research Intern, May 2025 - Feb 2026. Advised by Prof. Jeff Bigham.
  • Designed and evaluated an automated journalism pipeline using SOTA LLMs to replicate end-to-end editorial workflows, from transcript segmentation to headline generation and topic prioritization.
  • Orchestrated large-scale crowd-sourced study and demonstrated that LLM-based headline quality and topic prioritization can meet and exceed professional standards.
  • Paper in Human-centered Evaluation and Auditing of Language Models (HEAL) Workshop at CHI 2026.

National University of Singapore SEEDER Group

Research Intern, May 2024 - Aug 2024. Advised by Prof. Xuanyao Fong.
  • Developed machine learning pipeline to detect, classify, and analyze diseases in plant leaves using U-Net segmentation and CNN classification models, achieving 90% segmented classification rate.
  • Stacked deep learning models with traditional ML models and augmented training data with cGAN-generated synthetic images, achieving over 95% detection accuracy.

Presentations

  • [2025] Carnegie Mellon University Human-Computer Interaction Institute REU Symposium

    Title: LLMs for Extracting Agenda Items from City Council Meetings and Generating Newsworthy Headlines
    Authors: David Xia
    Program: CMU HCII REU
    Presentation: Poster

  • [2025] Algebra, Geometry and Combinatorics Day (joint work presented by Duy Phan)

    Title: RSK linear operators and the Vershik-Kerov-Logan-Shepp curve
    Authors: Duy Phan, David Xia
    Program: Algecom
    Presentation: Poster

  • [2025] Joint Mathematics Meeting

    Title: Predictability in Major Sports Leagues: Betting Market vs Model Based Predictions
    Authors: David Xia
    Program: Joint Mathematics Meeting
    Presentation: Poster

  • [2024] Illinois Mathematics Lab Research Symposium

    Title: Predictability in College Sports
    Authors: Yihan Gao, Samuel Lam, and David Xia
    Presentation: Poster

  • [2024] Rose-Hulman Undergraduate Math Conference

    Title: Predictability in College Sports: Comparing the Accuracies of Prediction Models for College Football and Basketball Games
    Authors: Yihan Gao, Samuel Lam, and David Xia
    Program: Rose-Hulman Undergraduate Math Conference
    Presentation: Oral Talk

  • [2024] University of Illinois Urbana-Champaign Undergraduate Research Symposium

    Title: Predictability and Competitiveness in Sports Leagues
    Authors: Yihan Gao, Samuel Lam, and David Xia
    Program: UIUC Undergraduate Research Symposium
    Presentation: Poster


Grants and Awards

  • [2025] CMU HCII REU: NSF 2349558

  • [2025] ICLUE RTG: NSF 1937241


Relevant Coursework

  • CS: Machine Learning, NLP, Deep Learning, Algorithms, Numerical Methods, Numerical Analysis
  • Math: Honors Real Analysis, Graph Theory, Combinatorics, Abstract Linear Algebra, Abstract Algebra I--II, Algebraic Combinatorics, Linear Programming
  • Stats: Stochastic Processes, Applied Bayesian Analysis, Survival Analysis

Languages

  • English: Native
  • Mandarin Chinese: Fluent, HSK6

PDF Version