Hsiang Hsu

Research Scientist · Frontier Model Evals, Alignment Robustness & Trustworthy AI

I build evaluation and intervention methods for knowledge control in ML systems, with a focus on machine unlearning, predictive multiplicity, and inference-time alignment.

Hsiang Hsu portrait

Research

My research studies how ML systems store, expose, forget, and use knowledge. I build evaluation and intervention methods that reveal hidden failure modes: residual knowledge after unlearning, decision and explanation inconsistencies across equally valid models, and unreliable inference-time steering with external knowledge, context, or rewards. Across these settings, I aim to make model behavior more auditable, robust, and interpretable, building on my earlier work in privacy, fairness, and information-theoretic ML.

Knowledge Removal: Robust Forgetting

Machine unlearning and residual-knowledge auditing: removing unwanted information from models and testing whether deletion remains robust under perturbations, attacks, and distribution shift.

View research portfolio

Knowledge Uncertainty: Hidden Multiplicity

Rashomon-effect research on predictive multiplicity: understanding when equally accurate models, decisions, or circuits disagree, and why this hidden instability matters for fairness, privacy, and interpretability.

View research portfolio

Knowledge Steering: Inference-Time Control

Inference-time alignment and knowledge control: steering which knowledge a model uses through reward-tail-aware selection, sparse feature interventions, and future memory-unit representations.

View research portfolio

Selected Papers

Knowledge Removal: Machine Unlearning and Residual Knowledge

  1. The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples NeurIPS 2025

    Shows that models can pass standard forgetting checks while retaining residual knowledge under perturbed deletion queries.

  2. Machine Unlearning for Image-to-Image Generative Models ICLR 2024

    Develops unlearning methods for image-to-image generative models, including diffusion models, VQ-GAN, and masked autoencoders.

  3. Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models CVPR 2024

    Improves the scalability of NTK-based machine unlearning with parameter-efficient updates for large-scale models.

Knowledge Uncertainty: Rashomon Effect and Multiplicity

  1. RashomonGB: Analyzing the Rashomon Effect and Mitigating Predictive Multiplicity in Gradient Boosting NeurIPS 2024

    Analyzes and mitigates conflicting individual predictions among similarly accurate gradient boosting models.

  2. Dropout-Based Rashomon Set Exploration for Efficient Predictive Multiplicity Estimation ICLR 2024

    Uses dropout-based exploration to estimate predictive multiplicity efficiently without repeatedly retraining neural networks.

  3. Individual Arbitrariness and Group Fairness NeurIPS Spotlight 2023

    Shows that group fairness interventions can improve aggregate fairness while increasing individual-level arbitrariness.

  4. Arbitrary Decisions are a Hidden Cost of Differentially Private Training FAccT 2023

    Reveals predictive multiplicity as a hidden cost of privacy-preserving randomized training procedures.

  5. Rashomon Capacity: A Metric for Predictive Multiplicity in Classification NeurIPS 2022

    Introduces Rashomon Capacity to quantify score-level disagreement among near-optimal probabilistic classifiers.

  6. Data as a Lever: A Neighbouring Datasets Perspective on Predictive Multiplicity arXiv preprint

    Studies predictive multiplicity through neighboring datasets and the sensitivity of model decisions to data perturbations.

Inference-Time Knowledge Steering and Generative Model Evaluation

  1. Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment arXiv preprint

    Uses reward-tail behavior to adaptively balance optimistic and pessimistic selection in inference-time alignment.

  2. HeavyWater and SimplexWater: Distortion-Free LLM Watermarks for Low-Entropy Next-Token Predictions NeurIPS 2025

    Studies watermarking in low-entropy generation regimes where naïve watermarking can be brittle or distortive.

  3. PaLD: Detection of Text Partially Written by Large Language Models ICLR 2025

    Detects hybrid human/LLM-written text rather than assuming documents are fully human- or machine-written.

  4. Probing LLM Hallucination from Within: Perturbation-Driven Approach via Internal Knowledge IEEE BigData 2025

    Probes hallucination through perturbation-driven analysis of internal knowledge and model behavior.

Foundations: Privacy, Fairness, and Information-Theoretic ML

  1. PASS: Private Attributes Protection with Stochastic Data Substitution ICML Spotlight 2025

    Protects private attributes while preserving downstream utility through stochastic data substitution.

  2. MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation ICML 2024

    Uses information-theoretic transformations to suppress multiple protected attributes while retaining utility.

  3. Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection NeurIPS Oral 2022

    Develops fairness interventions for multi-class prediction using information projection.

  4. Generalizing Correspondence Analysis for Applications in Machine Learning TPAMI 2021

    Extends correspondence analysis for representation learning and machine-learning applications.

  5. CPR: Classifier-Projection Regularization for Continual Learning ICLR 2021

    Regularizes classifier projections to improve continual learning under sequential tasks.

  6. Information-Theoretic Privacy Watchdogs ISIT 2019

    Develops information-theoretic tools for auditing and constraining privacy leakage.

Full publication and patent list on Google Scholar

Experience

Research Lead · Applied Intelligence Lab

JPMorgan Chase & Co. · New York · 2025–Present

Leading work on adversarial evaluation, unlearning verification, reward and decision instability, and model safety governance.

Research Scientist · Applied Intelligence Lab

JPMorgan Chase & Co. · New York · 2023–2025

Developed evaluation methods for residual knowledge, LLM watermarking, partial-LLM text detection, hallucination probing, and predictive multiplicity.

Research Assistant

Harvard University · 2017–2023

Worked on information-theoretic tools for fairness, privacy, representation learning, and decision uncertainty.

Recognition & Service

Education

Harvard University

PhD & MS in Computer Science · 2017–2023

Thesis: Information-Theoretic Tools for Machine Learning Beyond Accuracy — fairness, privacy, and decision uncertainty.

National Taiwan University

MS in Electrical Engineering · BS in Electrical Engineering and Mathematics

Best Master’s Thesis Award and National Young Scholar Best Paper Award.

Selected Media