Experience
I work at the boundary of model research, training systems, and evaluation, with a focus on making AI hold up under real-world constraints.
Professional Work
-
To improve 7-8B VLM serving performance under on-device constraints, I explored how far they could be compressed without retraining or major architectural changes.
I found that VLMs were more sensitive to depth pruning than LLMs, and that vision tokens were integrated at very different depths across model families.
Based on those observations, I developed a pruning strategy that preserved benchmark performance better than LLM-oriented baselines.
In practice, however, open-ended conversation quality still degraded too much, making it clear that deployment-quality VLM compression would require more than train-free depth pruning alone.
-
Most vision-token pruning methods were designed to reduce redundancy, but they were rarely evaluated inside deployment inference stacks such as vLLM.
I extended vLLM to support these methods, making deep systems-level changes so they could be tested in realistic serving environments.
That enabled a fairer comparison between paper claims and deployment behavior.
In many cases, the overhead of deciding which tokens to prune outweighed the latency gains, highlighting the gap between paper-level efficiency and real-world usefulness.
-
As training became increasingly important for solving the real product problem, the team was still constrained by a slow, dependency-heavy TRL-based RL stack that limited resource efficiency.
I reviewed RL training frameworks for foundation models, selected verl for our scale, and led the migration by refactoring the surrounding training logic.
To make the new stack stable in practice, I traced and resolved multiple sources of training instability, including critical interactions with inference-engine initialization, which required digging into backend details such as FSDP and Megatron.
I documented the migration and stability work so the system would be easier for the team to maintain and extend.
-
The team lacked a benchmark and evaluation loop rigorous enough for the problems we were actually trying to solve.
I established that loop through source-data collection, processing, human labeling, performance evaluation, and iterative benchmark refinement, then extended it into an agent-based data pipeline that filled a missing data-engineering role.
Using a medallion-style architecture, I defined clear stage boundaries, terminal states, and agent instructions so datasets and error-analysis workflows could be generated, traced, backtracked, and refined through a more automated process.
This gave the team a repeatable evaluation pipeline instead of one-off dataset work.
-
Protein-ligand interaction models were becoming biased toward one modality because protein and ligand inputs had very different structures.
I identified this issue through error analysis and addressed it with a multimodal mixing approach built by combining existing feature-mixing methods.
The resulting model showed improved prediction performance.
-
The existing molecular generation pipeline relied on reinforcement learning, but it tended to produce similar molecules without responding meaningfully to the target protein.
I introduced a diffusion-based generation approach conditioned on protein properties, betting that diffusion's strong conditioning behavior could make target-aware generation more effective even before it became common in small-molecule modeling.
The work produced promising benchmark results, but also made the gap between academic generation metrics and downstream drug-discovery value impossible to ignore.
-
As I addressed both problems, a more fundamental limitation became clear: small molecules and proteins were still being modeled without a strong shared representation across scales.
I tried to address this by building a representation learning framework that aligned ligand and protein embeddings in a shared space through contrastive learning.
Drawing on my mathematical background, I also incorporated SO(3)-equivariant design to capture more universal geometric structure.
The approach showed promising results on small-molecule property prediction, while also making clear that broader progress in this area would depend on much larger data and compute.
Research Foundation
My research foundation grew out of undergraduate work in optimization theory and later graduate research in deep learning theory and generative modeling.
M.S. Research
Sep 2020 - Aug 2022
- Worked on Neural Tangent Kernels and related questions about the training dynamics of overparameterized neural networks.
- Researched generative modeling through diffusion and Schrödinger Bridge methods, culminating in my master's thesis.
B.S. Research
Mar 2016 - Aug 2020
- Explored the theoretical background behind optimizer convergence in deep learning through an undergraduate research program.
- Strengthened my mathematical grounding in machine learning and reinforcement learning.
Teaching
Elementary Mathematical Analysis
Spring 2022
Linear Algebra
Spring 2022, Fall 2021
Supported students through office hours on concepts, proofs, assignments, and exam preparation.
Calculus
Fall 2021, Spring 2021, Fall 2020
Led problem sessions centered on core exercises and problem-solving practice.