Shanghai / Remote

Wenhao Chen

Trustworthy AI · AI Alignment

Undergraduate at PKU School of EECS. My research interests focus on trustworthy AI and AI alignment. This site collects research notes, essays, projects, and public updates.

View research Read blog

Research

News: Our work has been accepted as spotlight paper in ICML 2026

ICML 2026 Spotlight Published 01 May 2026

Toward Stable Value Alignment: Introducing Independent Modules for Consistent Value Guidance

Wenhao Chen, Sirui Sun, Shengyuan Bai, Guojie Song

We propose SVGT, a plug-and-play module that achieves stable LLM alignment by decoupling value modeling from the backbone's dynamic residual stream and steering generation via latent Bridge Tokens.

SVGT introduces an independent value module with dedicated value representations and explicit behavioral guidance. Latent Bridge Tokens act as dynamic value anchors, steering generation without disrupting the backbone's internal representations. Across multiple backbones and safety benchmarks, SVGT reduces harmful scores by over 70% while maintaining generation fluency.

PDF soon arXiv soon Code soon

Blog

Recent writing

Markdown folder

Loading writing...

Creations

Other work

Markdown folder

Loading creations...

Projects

Public work that supports collaboration.

Research notes on trustworthy AI and AI alignment
Blog essays on alignment theory and AI systems
Creative writing, visual experiments, and other public artifacts
Public research summaries, slides, and project artifacts