I'm on the job market - looking to start summer 2024!

I'm a final-year Ph.D. student in AI at UC Berkeley, co-advised by Trevor Darrell and Joseph Gonzalez as part of the BAIR and Sky labs. I was a visiting researcher at FAIR within Meta for 2 years during my Ph.D. Before coming to Berkeley, I obtained my B.S. in computer science at Cornell University.

I work on improving the reliability and safety of multimodal models, especially within text generation. My focus has been on localizing and reducing hallucinations for vision + language models, along with measuring and using uncertainty and mitigating bias.

Being in tech, I stereotypically enjoy climbing, hiking, and woodworking. In recent years, I've survived a climbing accident, a nighttime bear encounter, and reviewer 2s. These have earned me metal screws in my ankle, a fear of Berkeley's mascot, and the papers below.

selected publications


  • Coming soon!
    A work on evaluating hallucinations and measuring their correlation with language priors in image captions.

  • CLAIR figure
  • CLAIR: Evaluating Image Captions with Large Language Models
    David M. Chan, Suzanne Petryk, Joseph E. Gonzalez, Trevor Darrell, John Canny
    EMNLP 2023
  • TL;DR: Ask an LLM to evaluate an image caption from 1-100: surprisingly, it correlates strongly with human judgments.

    TLC result figure
  • Simple Token-Level Confidence Improves Caption Correctness
    Suzanne Petryk, Spencer Whitehead, Joseph E. Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach
    WACV 2024, ICCV 2023 CLVL Workshop (Oral)
  • TL;DR: A simple method to measure image-caption correctness and reduce hallucinations. Uses token confidences that are either off-the-shelf (e.g., softmax score) or from a learned confidence estimator.

    Reliable VQA method figure
  • Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly
    Spencer Whitehead*, Suzanne Petryk*, Vedaad Shakib, Joseph E. Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach
    ECCV 2022
  • TL;DR: Learn to say "I don't know" to VQA inputs that the model would've answered incorrectly.

    GALS method figure
  • On Guiding Visual Attention with Language Specification
    Suzanne Petryk*, Lisa Dunlap*, Keyan Nasseri, Joseph E. Gonzalez, Trevor Darrell, Anna Rohrbach
    CVPR 2022
  • TL;DR: Divert an image classifier's attention away from biases and towards task-relevant features by grounding human advice with CLIP.

    Blank image placeholder
  • Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting
    Sayna Ebrahimi, Suzanne Petryk, Akash Gokul, William Gan, Joseph E. Gonzalez, Marcus Rohrbach, Trevor Darrell
    ICLR 2021
  • TL;DR: Store model explanations in a memory bank during continual learning to reduce catastrophic forgetting.