Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement

University of Central Florida
ICCV 2025 🥳

Clustering using color histograms groups images/video frames based on people wearing the same clothes. Video frames taken from the CCVID dataset. This suggests that colors can serve as a proxy for clothing labels.

Abstract 😏

Clothes-Changing Re-Identification (CC-ReID) aims to recognize individuals across different locations and times, irrespective of clothing. Existing methods often rely on additional models or annotations to learn robust, clothing-invariant features, making them resource-intensive. In contrast, we explore the use of color—specifically foreground and background colors—as a lightweight, annotation-free proxy for mitigating appearance bias in ReID models. We propose Colors See, Colors Ignore (CSCI), a RGB-only method that leverages color information directly from raw images or video frames. CSCI efficiently captures color-related appearance bias ('Color See') while disentangling it from identity-relevant ReID features ('Color Ignore'). To achieve this, we introduce S2A self-attention, a novel self-attention to prevent information leak between color and identity cues within the feature space. Our analysis shows a strong correspondence between learned color embeddings and clothing attributes, validating color as an effective proxy when explicit clothing labels are unavailable. We demonstrate the effectiveness of CSCI on both image and video ReID with extensive experiments on four CC-ReID datasets. We improve baseline by Top-1 2.9% on LTCC and 5.0% on PRCC for image-based ReID, and 1.0% on CCVID and 2.5% on MeVID for video-based ReID without relying on additional supervision. Our results highlight the potential of color as a cost-effective solution for addressing appearance bias in CC-ReID..

Method 😎

MY ALT TEXT

Traditional transformer-based ReID models use RGB spatial pages of input and pass them through layers of transformers. The class token is used as ReID Token for inference and is trained using triplet loss, an identity-based classifier. We introduce one additional class token, called Color Token, for which learns color embedding via MSE (regression) on color histograms. We then disentangle Color Token from ReID token using cosine loss.

Alternatives 😱

► Alternative to our S2A self-attention would be to use two transformers, one for ReID and the other for appearance bias, which is computationally impractical for deployment, currently deployed by many ReID works as 2 ResNets or 2 transformers: one for biometrics and other for appearence bias (diffusion models for clothes, LLMs for clothes description).
► Another alternative to S2A self-attention would be to just leak the information between biometrics and appearence bias by sharing the backbone, most famously done by CAL CC-ReID. In Transformers, that would be "Traditional Self-Attention".
► An alternative to using color would be to use "traditional" clothing integer annotations instead of colors. However, colors are more expressive than integer clothing labels.
► Another alternative would be that LLM-based fine-grained clothing description; computationally infeasible. Fine-grained description needs to be generated per frame on video, as clothing may change across video

Results 🧐

Numbers reported are an average of two runs, however pretrained weights are provided for the best performing models which have much higher accuracy than the reported values.

Ablation 🥸

BibTeX 🥹

@InProceedings{Pathak_2025_ICCV,
  author    = {Pathak, Priyank and Rawat, Yogesh S},
  title     = {Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2025},
}