Fanqi Wan's homepage

Fanqi Wan (万凡琦)

Hello! My name is Fanqi Wan, and I am a final-year MS student at Sun Yat-sen University, advised by Prof. Xiaojun Quan. Before this, I received my Bachelor's degree from Xi'an Jiaotong University. Previously, I conducted my internship at Tencent AI Lab (2023.03-2024.05), ByteDance Seed (2024.06-2025.02), and Alibaba Tongyi (2025.03-now).

Email / CV / Google Scholar / Semantic Scholar / GitHub / HF / X

Research

My main research interests focused on deep learning for natural language generation. Previously, my work primarily focused on dialogue systems. After the emergence of large language models (LLMs), my research direction shifted towards LLM alignment (e.g., domain-specific LLMs, factual LLMs, self-improving LLMs, long-context LLMs, reasoning LLMs) and knowledge fusion (e.g., combining the strengths of LLMs with diverse structures/scales/functionarities). Representative papers are highlighted.

Knowledge Fusion
	FuseO1-Preview: System-II Reasoning Fusion of LLMs Fanqi Wan, Longguang Zhong, Ziyi Yang, Weizhou Shen, Xinting Huang Tech Report, 2025 [GitHub] / [HF] / [Blog] / [r/LocalLLaMA] / [Mergekit Support] FuseO1-Preview is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. The resulted FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview achieves a Pass@1 accuracy of 74.0 on AIME24, demonstrating significant performance improvements compared to the OpenAI o1-preview (44.6) and OpenAI o1-mini (63.4), even approaching OpenAI o1 (79.2).
	FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion Ziyi Yang, Fanqi Wan, Longguang Zhong, Canbing Huang, Guosheng Liang, Xiaojun Quan ICLR SCI-FM Workshop, 2025 [GitHub] / [HF] / [Paper] / [HF Daily Papers] / [r/LocalLLaMA] / [魔搭社区] / [Blog] / [Website] We introduce FuseChat-3.0, a suite of large language models (LLMs) developed by integrating the strengths of heterogeneous source LLMs into more compact target LLMs. Using Llama-3.1-8B-Instruct as the target model, it demonstrates remarkable gains of 37.1 points and 30.1 points on AlpacaEval-2 and Arena-Hard, respectively.
	FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion Longguang Zhong, Fanqi Wan, Ziyi Yang, Guosheng Liang, Tianyuan Shi, Xiaojun Quan Tech Report, 2025 [Paper] We propose FuseRL, a novel two-stage framework comprising FuseSFT and FusePO to maximize the utilization of source LLMs. Using Llama-3.1-8B-Instruct as the target model, our approach achieves state-of-the-art performance among 8B LLMs on AlpacaEval-2 and Arena-Hard.
	Weighted-Reward Preference Optimization for Implicit Model Fusion Ziyi Yang, Fanqi Wan, Longguang Zhong, Tianyuan Shi, Xiaojun Quan ICLR, 2025 [GitHub] / [HF] / [Paper] / [HF Daily Papers] We propose Weighted-Reward Preference Optimization (WRPO), which leverages preference optimization between the source LLMs and the target LLM to transfer their capabilities effectively. WRPO achieves a LC Win Rate of 55.9% against GPT-4-Preview-1106 on AlpacaEval-2 and a Win Rate of 46.2% against GPT-4-0314 on Arena-Hard.
	FuseChat: Knowledge Fusion of Chat Models Fanqi Wan, Longguang Zhong, Ziyi Yang, Ruijun Chen, Xiaojun Quan Tech Report, 2024 [GitHub] / [HF] / [Paper] / [Featured by AK] / [HF Daily Papers] / [机器之心] We propose FuseChat, an extended framework of FuseLLM to integrate the collective knowledge and individual strengths of multiple structure- and scale-varied chat LLMs into a more powerful chat LLM. FuseChat-7B is comparable to the larger Mixtral-8x7B-Instruct and and approaches GPT-3.5-Turbo-1106 on MT-Bench.
	ProFuser: Progressive Fusion of Large Language Models Tianyuan Shi, Fanqi Wan, Canbin Huang, Xiaojun Quan, Chenliang Li, Ming Yan, Ji Zhang Tech Report, 2024 [GitHub] / [HF] / [Paper] We introduce a novel approach that enhances the fusion process by incorporating both the training and inference modes from multiple structurally diverse LLMs.
	Knowledge Fusion of Large Language Models Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi ICLR, 2024 [GitHub] / [HF] / [Paper] / [Featured by elvis] / [Featured by AIDB] / [机器之心] We propose FuseLLM to create a unified model that combines the distinctive strengths of multiple structurally diverse LLMs. FuseLLM-7B surpasses Llama-2-7B on 12 benchmarks, including commonsense, reasoning, question-answering, and code generation.
LLM Alignment
	QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Fanqi Wan, Weizhou Shen, Shengyi Liao, Yingcheng Shi, Chenliang Li, Ziyi Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan Tech Report, 2025 [GitHub] / [HF] / [Paper] / [r/LocalLLaMA] / [HF Daily Papers] We propose QwenLong-L1, a framework that adapts short-context LRMs to long-context scenarios via progressive context scaling. QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B, achieving performance on par with Claude-3.7-Sonnet-Thinking
	SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization Huashan Sun, Shengyi Liao, Yansen Han, Yu Bai, Yang Gao, Cheng Fu, Weizhou Shen, Fanqi Wan, Ming Yan, Ji Zhang, Fei Huang Tech Report, 2025 [Paper] We propose a framework named Short-to-Long Preference Optimization (SoLoPO), decoupling long-context preference optimization (PO) into two components: short-context PO and short-to-long reward alignment (SoLo-RA), supported by both theoretical and empirical evidence.
	Mutual-Taught for Co-adapting Policy and Reward Models Tianyuan Shi, Canbin Huang, Fanqi Wan, Longguang Zhong, Ziyi Yang, Weizhou Shen, Xiaojun Quan, Ming Yan ACL, 2025 [Paper] we propose Mutual-Taught, a self-training method that iteratively improves both the policy model and reward model without requiring additional human annotation.
	Let Large Languague Models Find the Data to Train Themselves Fanqi Wan, Deng Cai, Shijue Huang, Xiaojun Quan, Mingxuan Wang Tech Report, 2025 [Paper] We propose ADS, a self-improving framework where models can invoke APIs to crawl and/or generate tailored datasets from various resources and environments, and retrain themselves.
	Advantage-Guided Distillation for Preference Alignment in Small Language Models Shiping Gao, Fanqi Wan, Jiajian Guo, Xiaojun Quan, Qifan Wang ICLR Spotlight, 2025 [GitHub] / [Paper] / [Paper Weekly] We propose Advantage-Guided Distillation for Preference Alignment (ADPA), which leverages an advantage function from the aligned teacher to deliver more nuanced, distribution-level reward signals for the student's alignment.
	Empowering Self-Learning of LLMs: Inner Knowledge Explicitation as a Catalyst Shijue Huang, Wanjun Zhong, Deng Cai, Fanqi Wan, Chengyi Wang, Mingxuan Wang, Mu Qiao, Ruifeng Xu AAAI, 2025 [Paper] We introduces a Self Knowledge Explicitation Learning (SKE-Learn) framework, which equips the LLMs with meta-skills to explicitly extract, verify and utilize inner knowledge for reasoning.
	Self-Evolution Fine-Tuning for Policy Optimization Ruijun Chen, Jiehao Liang, Shiping Gao, Fanqi Wan, Xiaojun Quan EMNLP Findings, 2024 [Paper] We introduce self-evolution fine-tuning for policy optimization, which eliminates the need for annotated data samples during alignment.
	Knowledge Verification to Nip Hallucination in the Bud Fanqi Wan, Xinting Huang, Leyang Cui, Xiaojun Quan, Wei Bi, Shuming Shi EMNLP, 2024 [GitHub] / [HF] / [Paper] We introduce Knowledge Consistent Alignment to verify and minimize the knowledge inconsistency between external knowledge in the alignment data and the intrinsic knowledge embedded in foundation LLMs, thus mitigating hallucinations before alignment.
	Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, Shuming Shi EMNLP, 2023 [GitHub] / [HF] / [Paper] / [Paper Weekly] We propose a novel approach to enhance the domain-specific instruction coverage by utilizing LLMs to explore the domain space from both breadth and depth automatically. Explore-Instruct outperforms Self-Instruct in three specific domains.
Dialogue Systems
	Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System Weizhou Shen, Yingqi Gao, Canbin Huang, Fanqi Wan, Xiaojun Quan, Wei Bi EMNLP, 2023 [GitHub] / [Paper] We introduce maximal marginal likelihood for retriever training to address the retrieval-generation misalignment in end-to-end task-oriented dialogue systems.
	Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog Fanqi Wan, Weizhou Shen, Ke Yang, Xiaojun Quan, Wei Bi ACL, 2023 [GitHub] / [HF] / [Paper] We propose a multi-grained knowledge retriever and introduce a novel distillation objective for retriever training. MAKER achieves SOTA performance on MultiWOZ 2.1 and CamRest with both condensed KB and full KB.
Misc.
	QwenLong-CPRS: Towards ∞-LLMs with Dynamic Context Optimization Weizhou Shen, Chenliang Li, Fanqi Wan, Shengyi Liao, Shaopeng Lai, Bo Zhang, Yingcheng Shi, Yuning Wu, Gang Fu, Zhansheng Li, Bin Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan Tech Report, 2025 [GitHub] / [HF] / [Paper] We propose QwenLong-CPRS, which implemented through a novel dynamic context optimization mechanism to enable multi-granularity context compression guided by natural language instructions, achieving both efficiency gains and improved performance.
	BlockPruner: Fine-grained Pruning for Large Language Models Longguang Zhong, Fanqi Wan, Ruijun Chen, Xiaojun Quan, Liangzhi Li ACL Findings, 2025 [GitHub] / [Paper] We propose a training-free structured pruning approach for fine-grained LLMs pruning.
	PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for Personality Detection Tao Yang, Tianyuan Shi, Fanqi Wan, Xiaojun Quan, Qifan Wang, Bingzhe Wu, Jiaxiang Wu EMNLP Findings, 2023 [GitHub] / [Paper] We propose a novel method for zero-shot personality detection in a multi-turn dialogue manner.
	Clustering-Aware Negative Sampling for Unsupervised Sentence Representation Jinghao Deng, Fanqi Wan, Tao Yang, Xiaojun Quan, Rui Wang ACL Findings, 2023 [GitHub] / [Paper] We propose a novel method that incorporates cluster information into contrastive learning for unsupervised sentence representation learning.

Education

MS Student in Computer Science, Sun Yat-sen University (2022.9-now).

Bachelor of Automation, Xi'an Jiaotong University (2018.9-2022.6).

Experience

Research Intern at NLP, Alibaba Tongyi, supervised by Dr. Ming Yan. (2025.3-now).

Research Intern at Horizon, ByteDance Seed, supervised by Dr. Deng Cai. (2024.6-2025.2).

Research Intern at NLPC, Tencent AI Lab, supervised by Dr. Xinting Huang and Dr. Wei Bi. (2023.3-2024.5).

Commercial Projects on E-Commerce Platforms, Vipshop, supervised by Dr. Rui Wang. (2022.4-2023.1).

Academic Competitions

2nd Prize on 2023 Xingzhi Cup Deep Learning Model Interpretability Task. (Team Leader) [Task]

2nd Prize on 2022 IFLYTEK AI Developer Competition Paper Abstracts Classification Task. (Team Leader) [Task]

3nd Prize on 2022 Ali Lingjie E-commerce Search Algorithm Competition. (Team Leader) [Task]

Academic Services

NeurIPS Reviewer, 2024-.

ICLR Reviewer, 2025-.

AISTATS Reviewer, 2025-.

ACL ARR Reviewer, 2025-.

Selected Awards

National Scholarship, Sun Yat-sen University, 2023.

Outstanding Award for Tencent AI Lab Rhino-Bird Focused Research Program, Tencent, 2023.

Excellent Graduate Students, Xi'an Jiaotong University, 2022.

National Scholarship, Xi'an Jiaotong University, 2018.

Website's code is from Jon Barron.