
Kun Wang, Xindi Wu, Sanghyuk Chun, Olga Russakovsky, Esin Tureci
Coming Soon 2026
Vision-language models often fail on images that violate their training priors—a deficit usually read as missing visual capability—yet we show that controlled visual in-context learning lets them overcome these priors, recovering large gains (up to +30.3%) on prior-conflicting examples while leaving real accuracy unchanged and revealing grounding abilities that standard evaluations overlook.
Kun Wang, Xindi Wu, Sanghyuk Chun, Olga Russakovsky, Esin Tureci
Coming Soon 2026
Vision-language models often fail on images that violate their training priors—a deficit usually read as missing visual capability—yet we show that controlled visual in-context learning lets them overcome these priors, recovering large gains (up to +30.3%) on prior-conflicting examples while leaving real accuracy unchanged and revealing grounding abilities that standard evaluations overlook.

Bohan Lyu, Yucheng Yang, Siqiao Huang, Jiaru Zhang, Qixin Xu, Xinghan Li, Xinyang Han, Yicheng Zhang, Huaqing Zhang, Runhan Huang, Kaicheng Yang, Zitao Chen, Wentao Guo, Junlin Yang, Xinyue Ai, Wenhao Chai, Yadi Cao, Ziran Yang, Kun Wang, Dapeng Jiang, Huan-ang Gao, Shange Tang, Chengshuai Shi, Simon S. Du, Max Simchowitz, Jiantao Jiao, Dawn Song, Chi Jin
arXiv Preprint 2026
We introduce MLS-Bench, a benchmark of 140 tasks across 12 domains testing whether AI systems can invent generalizable, scalable ML methods rather than only apply existing ones—and find that current agents remain far from surpassing human-designed methods, bottlenecked by the scientific insight needed to plan and validate claims rather than by more search or compute.
Bohan Lyu, Yucheng Yang, Siqiao Huang, Jiaru Zhang, Qixin Xu, Xinghan Li, Xinyang Han, Yicheng Zhang, Huaqing Zhang, Runhan Huang, Kaicheng Yang, Zitao Chen, Wentao Guo, Junlin Yang, Xinyue Ai, Wenhao Chai, Yadi Cao, Ziran Yang, Kun Wang, Dapeng Jiang, Huan-ang Gao, Shange Tang, Chengshuai Shi, Simon S. Du, Max Simchowitz, Jiantao Jiao, Dawn Song, Chi Jin
arXiv Preprint 2026
We introduce MLS-Bench, a benchmark of 140 tasks across 12 domains testing whether AI systems can invent generalizable, scalable ML methods rather than only apply existing ones—and find that current agents remain far from surpassing human-designed methods, bottlenecked by the scientific insight needed to plan and validate claims rather than by more search or compute.

Kunal Gupta, Ishit Mehta, Kun Wang, Nicholas Chua, Yan Deng, Abhimanyu Krishna, Ravi Ramamoorthi, Manmohan Chandraker
International Conference on 3D Vision (3DV) 2026
We propose InteriorAgent, an LLM-agent-driven framework for text-to-3D indoor scene generation that encodes interior design principles through a novel scene description language and synthesis and optimization tools, producing scenes that users strongly favor over prior state-of-the-art methods.
Kunal Gupta, Ishit Mehta, Kun Wang, Nicholas Chua, Yan Deng, Abhimanyu Krishna, Ravi Ramamoorthi, Manmohan Chandraker
International Conference on 3D Vision (3DV) 2026
We propose InteriorAgent, an LLM-agent-driven framework for text-to-3D indoor scene generation that encodes interior design principles through a novel scene description language and synthesis and optimization tools, producing scenes that users strongly favor over prior state-of-the-art methods.

Kun Wang*, Sumanth Varambally*, Duncan Watson-Parris, Yian Ma, Rose Yu (* equal contribution)
International Conference on Machine Learning (ICML) 2025 | Oral Presentation at NeurIPS 2024 Causal Representation Learning Workshop
This paper presents a novel approach to discovering latent causal structures from spatio-temporal data, addressing the challenge of identifying causal relationships in complex dynamical systems.
Kun Wang*, Sumanth Varambally*, Duncan Watson-Parris, Yian Ma, Rose Yu (* equal contribution)
International Conference on Machine Learning (ICML) 2025 | Oral Presentation at NeurIPS 2024 Causal Representation Learning Workshop
This paper presents a novel approach to discovering latent causal structures from spatio-temporal data, addressing the challenge of identifying causal relationships in complex dynamical systems.