Hsiang Hsu

Research Scientist · Frontier Model Evals, Alignment Robustness & Trustworthy AI

I work on stress-testing model behavior beyond average-case evaluation: residual knowledge after unlearning, prompt-specific reward tail risk, predictive multiplicity, and hidden failure modes in generative systems. My goal is to turn uncertainty, privacy, and robustness failures into operational evals for model behavior, alignment, and intervention verification.

Hsiang Hsu portrait

Research

I care about building reliable evaluation methods for frontier models and AI systems: tests that expose where models are brittle, unstable, or unsafe even when aggregate metrics look strong. Current and previous threads include:

Research Artifacts

AlignmentTailBench

Lightweight eval harness for ranking prompts by reward variance, CVaR, lower-tail reward, and failure examples.

Repository coming soon

ResidualKnowledgeProbe

Stress test for unlearning and model editing: finds cases where models pass forgetting checks but recover knowledge under perturbations.

Repository coming soon

BehavioralMultiplicity

Demo for surfacing unstable individual decisions across near-equivalent models or checkpoints with similar aggregate performance.

Repository coming soon

Selected Papers

Frontier Model Evals, Unlearning, and Generative Model Safety

  1. The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples NeurIPS 2025

    Shows that models can appear to forget under standard tests while retaining residual knowledge under perturbed deletion queries.

  2. HeavyWater and SimplexWater: Distortion-Free LLM Watermarks for Low-Entropy Next-Token Predictions NeurIPS 2025

    Studies watermarking behavior in low-entropy LLM regimes where naive watermarking may be brittle or distortive.

  3. PaLD: Detection of Text Partially Written by Large Language Models ICLR 2025

    Detects hybrid human/LLM text rather than assuming documents are fully human- or machine-written.

  4. Machine Unlearning for Image-to-Image Generative Models ICLR 2024

    Develops unlearning methods for generative image-to-image models.

Predictive Multiplicity, Decision Uncertainty, and Alignment Robustness

  1. RashomonGB: Analyzing the Rashomon Effect and Mitigating Predictive Multiplicity in Gradient Boosting NeurIPS 2024

    Analyzes conflicting individual predictions among similarly accurate gradient boosting models.

  2. Dropout-Based Rashomon Set Exploration for Efficient Predictive Multiplicity Estimation ICLR 2024

    Uses dropout-based exploration to estimate model multiplicity more efficiently.

  3. Individual Arbitrariness and Group Fairness NeurIPS Spotlight 2023

    Studies how group-level fairness constraints can induce individual-level arbitrariness.

  4. Rashomon Capacity: A Metric for Predictive Multiplicity in Classification NeurIPS 2022

    Introduces a metric for quantifying how much near-optimal models disagree on individual decisions.

  5. Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment arXiv preprint

    Frames prompt-specific reward distributions and tail outcomes as signals for inference-time alignment risk.

Fairness, Privacy, and Information-Theoretic ML

  1. PASS: Private Attributes Protection with Stochastic Data Substitution ICML Spotlight 2025

    Protects private attributes while preserving downstream utility through stochastic data substitution.

  2. MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation ICML 2024

    Uses information-theoretic transformations to suppress multiple protected attributes while retaining utility.

  3. Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection NeurIPS Oral 2022

    Develops fairness interventions for multi-class prediction using information projection.

  4. Generalizing Correspondence Analysis for Applications in Machine Learning TPAMI 2021

    Extends correspondence analysis for representation and machine-learning applications.

Full publication list on Google Scholar

Experience

Research Lead · Applied Intelligence Lab

JPMorgan Chase & Co. · New York · 2025–Present

Leading work on adversarial evaluation, unlearning verification, reward and decision instability, and model safety governance.

Research Scientist · Applied Intelligence Lab

JPMorgan Chase & Co. · New York · 2023–2025

Developed evaluation methods for residual knowledge, LLM watermarking, partial-LLM text detection, hallucination probing, and predictive multiplicity.

Research Assistant

Harvard University · 2017–2023

Worked on information-theoretic tools for fairness, privacy, representation learning, and decision uncertainty.

Recognition & Service

Education

Harvard University

PhD & MS in Computer Science · 2017–2023

Thesis: Information-Theoretic Tools for Machine Learning Beyond Accuracy — fairness, privacy, and decision uncertainty.

National Taiwan University

MS in Electrical Engineering · BS in Electrical Engineering and Mathematics

Best Master’s Thesis Award and National Young Scholar Best Paper Award.

Selected Media