Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety

Figure: MLLMs reveal biometric information - such as race, eye color, age or gender - when prompted with both biometric-related and open-ended questions.

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in vision-language tasks. However, these models often infer and reveal sensitive biometric attributes such as race, gender, age, body weight, and eye color - even when such information is not explicitly requested. This raises critical concerns, particularly in real-world applications and socially-sensitive domains. Despite increasing awareness, no publicly available dataset or benchmark exists to comprehensively evaluate or mitigate biometric leakage in MLLMs. To address this gap, we introduce PRISM (Privacy-aware Evaluation of Responses in Sensitive Modalities), a new benchmark designed to assess MLLMs on two fronts: (1) refuse biometric-related queries and (2) implicit biometric leakage in general responses while maintaining semantic faithfulness. Further, we conduct a detailed audit of the widely used LLaVA datasets and uncover extensive biometric leakage across pretraining and instruction data. To address this, we present Safe-LLaVA dataset, the first privacy-preserving MLLM training dataset constructed by systematically removing explicit and implicit biometric information from LLaVA dataset. Our evaluations on PRISM reveal biometric leakages across MLLMs for different attributes, highlighting the detailed privacy-violations. We also fine-tune a model on Safe-LLaVA dataset and show that it substantially reduces the biometric leakages. Together, Safe-LLaVA & PRISM set a new standard for privacy-aligned development and evaluation of MLLMs. The Safe-LLaVA dataset & PRISM benchmark are publicly available at https://huggingface.co/datasets/kyh9191/Safe-LLaVA, and the source code is available at https://github.com/Kimyounggun99/SafeLLaVA.git.

Main contributions:

We propose PRISM, a novel benchmark designed to evaluate MLLMs on their ability to (1) refuse biometric-related prompts and (2) suppress biometric leakage in open-ended responses while maintaining semantic fidelity.
We conduct extensive evaluations on the PRISM bench using multiple Judges, to highlight implicit and explicit leakage in various MLLMs.
We perform a comprehensive audit of the LLaVA pretraining and instruction-tuning datasets, revealing widespread biometric attribute leakage.
We introduce Safe-LLaVA, the first privacy-preserving MLLM training data, systematically cleaned to remove explicit and implicit biometric cues from captions, questions and answers. We release both Safe-LLaVA Pre-Training and Safe-LLaVA Instruction-tuning datasets.
We further demonstrate that fine-tuning on the Safe-LLaVA dataset, the model reduces both explicit and implicit biometric leakage while maintaining general performance.

PRISM benchmark distribution and tasks.

PRISM Benchmark data distribution across 5 attributes and 22 sub-categories. Sample questions from both Implicit and Explicit (refusal) task are shown.

Safe-LLaVA vs LLaVA Dataset.

Comparison of ground truth responses between LLaVA and Safe-LLaVA datasets across different biometric categories. As shown LLaVA dataset includes explicit mentions of sensitive attributes like gender, age, race, and weight. In contrast, Safe-LLaVA replaces or refuses such content to protect privacy while retaining the overall meaning of the response.

Bold indicates best performance, while red indicates worst. * indicates base models trained under same settings as Safe-LLaVA.

Left: Average refusal accuracy across various MLLMs, Right: Implicit leakage protection score across MLLMs.

BibTeX

@article{safellava,
title={Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety},
author={Younggun Kim and Sirnam Swetha and Fazil Kagdi and Mubarak Shah},
journal={arXiv preprint arXiv:2509.00192},
year={2025}
}

Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety

Abstract

Safe-LLaVA addresses a critical gap in biometric safety.

Main contributions:

PRISM Benchmark & Safe-LLaVA Dataset Overview

Experimental Results

Implicit Biometric information leakage on PRISM benchmark.

Sentence-level Implicit Biometric information leakage on PRISM benchmark..

Refusal accuracy and Implicit leakage protection score across MLLMs.

Qualitative examples on PRISM and General Benchmarks.

Performance of Safe-LLaVA on both PRISM and General Benchmarks.

BibTeX