profile picture

Zirui Wang

王子瑞

Apple AI/ML

Email: ziruiw [at] apple (dot) com

Google Scholar

LinkedIn / Twitter

I currently lead the post-training effort for Apple Foundation Models. Before that, I was a Research Scientist at Google Brain. I work on language modeling, multimodal models and generative models.

I received my PhD at Language Technologies Institute in Carnegie Mellon University, advised by Professor Jaime Carbonell. Sadly, Jaime passed away in 2020 and I had been working with Professor Yulia Tsvetkov and Professor Emma Strubell since then. Jaime will always be my advisor and deeply missed. Prior to my graduate studies, I obtained my Bachelar in Computer Science at Carnegie Mellon University.

Publications

* indicates co-first author.

Apple Intelligence Foundation Language Models.
Apple Foundation Models Team: Zirui Wang, Post-train lead
[Tech Report] [Powering Apple Intelligence] [Apple ML Blog]

CoCa: Contrastive Captioners are Image-Text Foundation Models.
Jiahui Yu*, Zirui Wang*, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, Yonghui Wu.
TMLR 2022. [arxiv] [Google AI Blog]

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision.
Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao.
ICLR 2022. [arxiv] [Google AI Blog]

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains.
Guoli Yin*, Haoping Bai*, Shuang Ma*, Feng Nan, Yanchao Sun, Zhaoyang Xu, Shen Ma, Jiarui Lu, Xiang Kong, Aonan Zhang, Dian Ang Yap, Yizhe zhang, Karsten Ahnert, Vik Kamath, Mathias Berglund, Dominic Walsh, Tobias Gindele, Juergen Wiest, Zhengfeng Lai, Xiaoming Wang, Jiulong Shan, Meng Cao, Ruoming Pang, Zirui Wang.
Arxiv 2024. [arxiv] [MMAU Data & Code]

ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities.
Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang.
Arxiv 2024. [arxiv] [code]

Understanding Alignment in Multimodal LLMs: A Comprehensive Study.
Elmira Amirloo, Jean-Philippe Fauconnier, Christoph Roesmann, Christian Kerl, Rinu Boney, Yusu Qian, Zirui Wang, Afshin Dehghan, Yinfei Yang, Zhe Gan, Peter Grasch.
Arxiv 2024. [arxiv]

Ferret: Refer and Ground Anything Anywhere at Any Granularity.
Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang.
ICLR 2024. [arxiv] [code]

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training.
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang.
Arxiv 2024. [arxiv]

Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training.
Xianzhi Du, Tom Gunter, Xiang Kong, Mark Lee, Zirui Wang, Aonan Zhang, Nan Du, Ruoming Pang.
Arxiv 2024. [arxiv]

REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory.
Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A. Ross, Alireza Fathi.
CVPR 2023. [arxiv] [Google AI Blog]

Guiding Image Captioning Models Toward More Specific Captions.
Simon Kornblith, Lala Li, Zirui Wang, Thao Nguyen.
CVPR 2023. [arxiv]

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation.
Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu.
TMLR 2022. [arxiv] [Parti Website]

VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners.
Shen Yan, Tao Zhu, Zirui Wang, Yuan Cao, Mi Zhang, Soham Ghosh, Yonghui Wu, Jiahui Yu.
Arxiv 2022. [arxiv]

Exploiting Category Names for Few-Shot Classification with Vision-Language Models.
Taihong Xiao, Zirui Wang, Liangliang Cao, Jiahui Yu, Shengyang Dai, Ming-Hsuan Yang.
Arxiv 2022. [arxiv]

Towards Zero-Label Language Learning.
Zirui Wang, Adams Wei Yu, Orhan Firat, Yuan Cao.
Arxiv 2021. [arxiv]

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models.
Zirui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao.
ICLR 2021 (Spotlight). [arxiv]

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment.
Zirui Wang, Zachary C Lipton, Yulia Tsvetkov.
EMNLP 2020. [arxiv] [code]

Efficient Meta Lifelong-Learning with Limited Memory.
Zirui Wang*, Sanket Vaibhav Mehta*, Barnabás Póczos, Jaime Carbonell.
EMNLP 2020. [arxiv] [code]

Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework.
Zirui Wang*, Jiateng Xie*, Ruochen Xu, Yiming Yang, Graham Neubig, Jaime Carbonell.
ICLR 2020. [arxiv] [code] [presentation]

Characterizing and Avoiding Negative Transfer.
Zirui Wang, Zihang Dai, Barnabás Póczos, Jaime Carbonell.
CVPR 2019. [arxiv]

Towards more Reliable Transfer Learning.
Zirui Wang, Jaime Carbonell.
ECML-PKDD 2018. [arxiv] [sup] [slides]