Publications

Towards understanding how transformer perform multi-step reasoning with matching operation

Zhiwei Wang, Yunji Wang, Zhongwang Zhang, Zhangchen Zhou, Hui Jin, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Yaoyu Zhang, Zhi-Qin John Xu, 2025.

Submitted to The Forty-second International Conference on Machine Learning (ICML 2025).

We propose a buffer mechanism and found evidence that supports such mechanism being employed by language models during the reasoning process. We propose a method to enhance the model’s reasoning capability, significantly improving data utilization efficiency in logical reasoning datasets.

Download [pdf].

Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers

Brian K Chen, Tianyang Hu, Hui Jin, Hwee Kuan Lee, Kenji Kawaguchi, 2025.

Published in The Forty-First International Conference on Machine Learning (ICML 2024).

We find a way to convert the prompts into the model weights by introducing an extra bias term into the attention module.

Download [pdf].

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen, 2025.

Submitted to The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) .

We analyze how LLMs learn new knowledge through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing.

Download [pdf].