I lead research at the intersection of AI and Software Engineering, with a particular focus on AI for Code. My work explores how neurosymbolic techniques—blending the power of machine learning with the rigor of formal reasoning—can transform software reliability, security, and developer productivity. Through building new language models, agents, and new abstractions, we aim to make code not just easier to write, but easier to trust. I also serve as an Amazon Scholar, collaborating with the Amazon Agentic AI team.
Learn more about our work at the ARiSE Lab, where we try to make your Code Smarter, Safer, and More Collaborative — with AI.
News
- Aug'25: visiting NUS to work on coding agents - June'25: Gave talk @FSE'25 doctoral symposium on mentoring students. - May'25: My PhD student, Yangruibo Ding, accepted a tenure-track faculty offer at UCLA. Congrats!! - ICML'25: Two papers got accepted. - NAACL'25: Oral Presentation.
Selected Publications​
Model Training​
-
Semcoder: Training code language models with comprehensive semantics reasoning (NeurIPS 2024)
-
Cycle: Learning to self-refine the code generation (OOPSLA 2024)
-
EditLord: Learning Code Transformation Rules for Code Editing (ICML 2025)
​​
​
Agents​
​​
​​
-
UTFix: Change aware unit test repairing using LLM (OOPSLA 2025)​
-
C2SaferRust: Transforming C Projects into Safer Rust with Neuro-Symbolic Techniques
-
FaultLine: Automated Proof-of-Vulnerability Generation Using LLM Agents
​​
​
Benchmarking
​​​​​​
-
Primevul: Vulnerability detection with code language models: How far are we? (ICSE 2025)
-
Kgym: A platform and dataset to benchmark large language models on linux kernel crash resolution (NeurIPS 2024)
-
LIBEVOLUTIONEVAL: A Benchmark and Study for Version-Specific Code Generation (NAACL 2025)
-
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance
-
CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
​​​​​​​​​​
​
Model Testing
​
-
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination (ICML 2025)
-
DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, ICSE 2018 ​
​​
Check out Publication for more details.
Awards
​
Honors:
-
IEEE CS TCSE Rising Star Award
-
Most Influential Paper Award, IEEE International Conference on Software Maintenance and Evolution, ICSME 2023
-
VMware Early Career Faculty Award, 2020.
-
IBM Faculty Award, 2019.
-
NSF CAREER Award, 2019.
​
Paper Recognition:
-
NAACL 2025 (Oral).
-
ACM SIGSOFT Distinguished Paper Award, ISSTA 2023.
-
ACM SIGSOFT Distinguished Paper Award, ASE 2023.
-
CoRL (Oral Presentation), 2023.
-
EAPLS (European Association of Programming Languages and Systems) FASE best paper award, 2020.
-
ACM Distinguished Paper Award, FSE 2017.
-
ACM Distinguished Paper Award, MSR 2017.
-
Best Student Paper Award, S&P (Oakland) 2014.
​
Fundings:
-
Govt. Funding: National Science Foundation (NSF)
-
Corporate Funding: Google, IBM, Amazon, Capital One, RedHat
-
Provost’s Grants Program for Junior Faculty