CLUE: Conflict-guided Localization for LLM Unlearning Framework
The LLM unlearning aims to eliminate the influence of undesirable data without affecting causally unrelated information. This process typically involves using a forget set to …
💪(he/him)
PhD student in Xi’an Jiaotong University
PhD Computer Science
2021-09-01
2026-03-30
Xi'an Jiaotong University
BS Computer Science
2016-09-01
2020-06-31
Xi'an Jiaotong University
My research encompasses a broad spectrum of language models, centered on how these models construct and utilize causal mechanisms. In my earlier work, I investigated methods to endow LLM representations with causal discrimination and explored the phenomenon of causal emergence within these complex architectures. Currently, my focus has shifted toward Mechanistic Interpretability (MI). I am particularly interested in the intersection of MI and parameter updating (such as SFT). My goal is to leverage mechanistic insights—identifying specific functional circuits—to guide more precise, surgical, and interpretable modifications to model behavior. By bridging these two fields, I aim to transform LLMs from “black boxes” into transparent systems that can be reliably controlled and updated for trusted applications. 😃
The LLM unlearning aims to eliminate the influence of undesirable data without affecting causally unrelated information. This process typically involves using a forget set to …
Circuit discovery has gradually become one of the prominent methods for mechanistic interpretability, and research on circuit completeness has also garnered increasing attention. …
The cross-fertilization of deep learning and causal discovery has given birth to broader causal data forms, involving multi-structured data like the Netsim dataset, and complex …