Shanghai / Remote

Wenhao Chen

Trustworthy AI · AI Alignment

Undergraduate at PKU School of EECS. My research interests focus on trustworthy AI and AI alignment. This site collects research notes, essays, projects, and public updates.

Research

News: Our work has been accepted as spotlight paper in ICML 2026

ICML 2026 Spotlight

Toward Stable Value Alignment: Introducing Independent Modules for Consistent Value Guidance

Wenhao Chen, Sirui Sun, Shengyuan Bai, Guojie Song

We propose SVGT, a plug-and-play module that achieves stable LLM alignment by decoupling value modeling from the backbone's dynamic residual stream and steering generation via latent Bridge Tokens.

SVGT introduces an independent value module with dedicated value representations and explicit behavioral guidance. Latent Bridge Tokens act as dynamic value anchors, steering generation without disrupting the backbone's internal representations. Across multiple backbones and safety benchmarks, SVGT reduces harmful scores by over 70% while maintaining generation fluency.

PDF soon arXiv soon Code soon

Blog

Recent writing

Markdown folder

Loading writing...

Creations

Other work

Markdown folder

Loading creations...

Projects

Public work that supports collaboration.