Research Interests
Theoretical foundations of machine learning, reinforcement learning, multi-armed bandits, optimization, high-dimensional probability
Education
-
University of Illinois Urbana-Champaign
Bachelor of Science in Computer Science
Bachelor of Science in Liberal Arts & Sciences Major in Mathematics, Data Optimization Concentration
Bachelor of Science in Liberal Arts & Sciences Major in Statistics
GPA: 3.98
2023 - 2027
Papers
-
[2025] RSK linear operators and the Vershik-Kerov-Logan-Shepp curveSubmitted to Electronic Journal of Combinatorics
-
[2024] EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations
Workshop Papers, Manuscripts, Notes
-
[2026] Algebraic Combinatorics Scribe Notes: The Cauchy Identity
-
[2026] Algebraic Combinatorics Scribe Notes: Schur Polynomials
-
[2026] Revitalizing Local Democracy: A Human-Centered Audit of LLMs in City Council Journalism
Research Experience
Illinois Combinatorics Lab for Undergraduate Experience (ICLUE)
Research Mentee, Aug 2024 - present. Advised by Prof. Alexander Yong.
- Proved a new asymptotic result that the probability that the Schensted insertion algorithm for a uniformly random permutation in Sn exhibits a special "bumping" interaction converges to 1 as n approaches infinity.
- Derived the result using rigorous probability bounds on the RSK linear operator by leveraging the Vershik-Kerov-Logan-Shepp limit shape curve as the central analytic tool.
- Developed proofs requiring techniques spanning real analysis (uniform convergence, limit arguments), probability theory (concentration inequalities, convergence in probability), and asymptotic analysis (bounding combinatorial quantities in the large-n regime).
- Paper in submission to the Electronic Journal of Combinatorics.
University of California San Diego
Research Intern, Jun 2023 - Dec 2024. Advised by Prof. Xinyu Zhang.
- Developed EveGuard, a software-based defense framework to protect voice privacy from vibrometry-based side channels using adversarial audio by using a perturbation generator model (PGM) to effectively suppress sensor-based eavesdropping while preserving high audio quality.
- Implemented Eve-GAN, a novel domain translation task for inferring eavesdropped signals, enabling end-to-end training of PGM, utilizing few-shot learning techniques to reduce data collection overhead.
- Achieved a protection rate of over 97% against audio classifiers, hindering eavesdropped reconstruction.
- Paper published in the IEEE Symposium on Security and Privacy 2025.
Illinois Mathematics Lab (formerly Illinois Geometry Lab)
Student Scholar, Jan 2024 - present. Advised by Prof. AJ Hildebrand.
- Researched the data science perspective of sports matches, analyzing the predictability of outcomes by comparing prediction accuracies across polls, the betting market, Elo ratings, and mathematical models like Bradley-Terry.
- Presented findings at Rose-Hulman Undergraduate Math Conference 2024, UIUC Undergraduate Research Symposium 2024, and Joint Mathematics Meeting 2025.
- Currently working on paper.
Carnegie Mellon University Human-Computer Interaction Institute (HCII)
Research Intern, May 2025 - Feb 2026. Advised by Prof. Jeff Bigham.
- Designed and evaluated an automated journalism pipeline using SOTA LLMs to replicate end-to-end editorial workflows, from transcript segmentation to headline generation and topic prioritization.
- Orchestrated large-scale crowd-sourced study and demonstrated that LLM-based headline quality and topic prioritization can meet and exceed professional standards.
- Paper to appear in Human-centered Evaluation and Auditing of Language Models (HEAL) Workshop at CHI 2026.
National University of Singapore SEEDER Group
Research Intern, May 2024 - Aug 2024. Advised by Prof. Xuanyao Fong.
- Developed machine learning pipeline to detect, classify, and analyze diseases in plant leaves using U-Net segmentation and CNN classification models, achieving 90% segmented classification rate.
- Stacked deep learning models with traditional ML models and augmented training data with cGAN-generated synthetic images, achieving over 95% detection accuracy.
Presentations
-
[2025] Carnegie Mellon University Human-Computer Interaction Institute REU Symposium
Title: LLMs for Extracting Agenda Items from City Council Meetings and Generating Newsworthy Headlines
Authors: David Xia
Program: CMU HCII REU
Presentation: Poster -
[2025] Algebra, Geometry and Combinatorics Day (joint work presented by Duy Phan)
Title: RSK linear operators and the Vershik-Kerov-Logan-Shepp curve
Authors: Duy Phan, David Xia
Program: Algecom
Presentation: Poster -
[2025] Joint Mathematics Meeting
Title: Predictability in Major Sports Leagues: Betting Market vs Model Based Predictions
Authors: David Xia
Program: Joint Mathematics Meeting
Presentation: Poster -
[2024] Illinois Mathematics Lab Research Symposium
Title: Predictability in College Sports
Authors: Yihan Gao, Samuel Lam, and David Xia
Presentation: Poster -
[2024] Rose-Hulman Undergraduate Math Conference
Title: Predictability in College Sports: Comparing the Accuracies of Prediction Models for College Football and Basketball Games
Authors: Yihan Gao, Samuel Lam, and David Xia
Program: Rose-Hulman Undergraduate Math Conference
Presentation: Oral Talk -
[2024] University of Illinois Urbana-Champaign Undergraduate Research Symposium
Title: Predictability and Competitiveness in Sports Leagues
Authors: Yihan Gao, Samuel Lam, and David Xia
Program: UIUC Undergraduate Research Symposium
Presentation: Poster
Grants and Awards
[2025] CMU HCII REU: NSF 2349558
[2025] ICLUE RTG: NSF 1937241
Relevant Coursework
- CS: Machine Learning, NLP, Deep Learning, Algorithms, Numerical Methods, Numerical Analysis
- Math: Honors Real Analysis, Graph Theory, Combinatorics, Abstract Linear Algebra, Abstract Algebra I--II, Algebraic Combinatorics, Linear Programming
- Stats: Stochastic Processes, Applied Bayesian Analysis, Survival Analysis
Languages
- English: Native
- Mandarin Chinese: Fluent, HSK6