Wei Zhai (翟伟)

I'm currently an Associate Researcher at the University of Science and Technology of China (USTC). I obtained my PhD degree from USTC in 2022, where I was advised by Professor Zheng-Jun Zha and Associate Professor Yang Cao.

Research: I work on computer vision, embodied intelligence and machine learning. I am currently focusing on three aspects: 1) Build efficient computational framework for embodied intelligence by drawing on brain mechanisms. 2) Develop egocentric perception, which involves understanding egocentric scenarios, analyzing present interactions, and anticipating future activity. 3) Endow embodied agents working in complex real-world scenes with generalizable 2D/3D vision and interaction skills.

EMail / School Homepage / Scholar / Lab

News

► (06/2025) Four papers are accepted by ICCV 2025 ~

► (06/2025) Our team wins the 1st Place of Efficient Event-based Eye-Tracking Challenge ~

► (06/2025) Our team wins the 1st Place of Body Contact Estimation Challenge (RHOBIN2025 CVPR) ~

► (05/2025) One paper is accepted by SCIENCE CHINA Information Sciences ~

► (02/2025) Five papers are accepted by CVPR 2025 (One Highlight Paper) ~

► (09/2024) One paper is accepted by NeurIPS 2024 ~

► (07/2024) One paper is accepted by ACM MM 2024 ~

► (07/2024) One paper is accepted by T-IP ~

► (07/2024) One paper is accepted by ECCV 2024 ~

► (06/2024) Our team wins the 2nd Place of 3D Contact Estimation Challenge (RHOBIN2024 CVPR) ~

► (04/2024) One paper is accepted by Optics Express ~

► (04/2024) One paper is accepted by T-AI ~

► (03/2024) Our team wins the 2nd Place of Efficient Super-Resolution Challenge (NTIRE2024 CVPR) ~

► (03/2024) Our team wins the 1st Place of Event-based Eye Tracking Task (AIS2024 CVPR) ~

► (02/2024) One papers are accepted by CVPR 2024 ~

► (12/2023) One paper is accepted by AAAI 2024 ~

► (11/2023) One paper is accepted by IJCV ~

► (10/2023) One paper is accepted by T-PAMI ~

► (09/2023) One paper is accepted by IJCV ~

► (07/2023) One papers are accepted by T-NNLS ~

► (07/2023) Two papers are accepted by ICCV 2023 ~

► (03/2023) Two papers are accepted by CVPR 2023 ~

► (01/2023) One paper is accepted by AAAI 2023 (Distinguished Paper) ~

► (09/2022) One paper is accepted by NeurIPS 2022 ~

► (08/2022) One paper is accepted by T-AI ~

► (06/2022) One paper is accepted by IJCV ~

► (Before 06/2022) ......

Experience

University of Science and Technology of China (USTC)

Jul 2024 - Now Associate Researcher at Department of Automation

University of Science and Technology of China (USTC)

Jul 2022 - Jun 2024 Postdoctoral Researcher in Department of Automation (working with Professor Zheng-Jun Zha and Associate Professor Yang Cao)

University of Science and Technology of China (USTC)

Sep 2017 - Jun 2022 Ph.D in School of Cyberspace Security (working with Professor Zheng-Jun Zha and Associate Professor Yang Cao)

JD Explore Academy

Dec 2020 - Sep 2021 Research Intern in JD Explore Academy (working with Professor Dacheng Tao and Jing Zhang)

Southwest Jiaotong University

Sep 2013 - Jun 2017 B.S. in Computer Science

Publications

2025

SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets
Yuhang Yang, Fengqi Liu, Yixing Lu, Qin Zhao, Pingyu Wu, Wei Zhai*, Ran Yi, Yang Cao, Lizhuang Ma, Zheng-Jun Zha, Junting Dong*.
IEEE/CVF International Conference on Computer Vision (ICCV 2025).
abstract / bibtex / code

@article{yang2025sigman,
  title={SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets},
  author={Yang, Yuhang and Liu, Fengqi and Lu, Yixing and Zhao, Qin and Wu, Pingyu and Zhai, Wei and Yi, Ran and Cao, Yang and Ma, Lizhuang and Zha, Zheng-Jun and others},
  journal={arXiv preprint arXiv:2504.06982},
  year={2025}
}

HERO: Human Reaction Generation from Videos
Chengjun Yu, Wei Zhai*, Yuhang Yang, Yang Cao, Zheng-Jun Zha.
IEEE/CVF International Conference on Computer Vision (ICCV 2025).
abstract / bibtex / code

@inproceedings{yu2025hero,
title={HERO: Human Reaction Generation from Videos},
author={Yu, Chengjun and Zhai, Wei and Yang, Yuhang and Cao, Yang and Zha, Zheng-Jun},
journal={arXiv preprint arXiv:2503.08270},
year={2025}
}

MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking
Han Han, Wei Zhai*, Yang Cao, Bin Li, Zheng-Jun Zha.
IEEE/CVF International Conference on Computer Vision (ICCV 2025).
abstract / bibtex / code

@article{han2024event,
title={Event-Based Tracking Any Point with Motion-Augmented Temporal Consistency},
author={Han, Han and Zhai, Wei and Cao, Yang and Li, Bin and Zha, Zheng-jun},
journal={arXiv preprint arXiv:2412.01300},
year={2024}
}

EMoTive: Event-guided Trajectory Modeling for 3D Motion Estimation
Zengyu Wan, Wei Zhai*, Yang Cao, Zheng-Jun Zha.
IEEE/CVF International Conference on Computer Vision (ICCV 2025).
abstract / bibtex

@article{wan2025emotive,
  title={EMoTive: Event-guided Trajectory Modeling for 3D Motion Estimation},
  author={Wan, Zengyu and Zhai, Wei and Cao, Yang and Zha, Zhengjun},
  journal={arXiv preprint arXiv:2503.11371},
  year={2025}
}

PEAR: Phrase-Based Hand-Object Interaction Anticipation
Zichen Zhang, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang.
SCIENCE CHINA Information Sciences (SCIS)
abstract / bibtex

@article{zhang2024pear,
    title={PEAR: Phrase-Based Hand-Object Interaction Anticipation},
    author={Zhang, Zichen and Luo, Hongchen and Zhai, Wei and Cao, Yang and Kang, Yu},
    journal={arXiv preprint arXiv:2407.21510},
    year={2024}
    }

BRAT: Bidirectional Relative Positional Attention Transformer for Event-based Eye tracking
Yuliang Wu, Han Han, Jinze Chen, Wei Zhai*, Yang Cao, Zheng-Jun Zha.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025), Workshop.
Efficient Event-based Eye-Tracking Challenge, 1st Place.
abstract / bibtex

@inproceedings{wu2025brat,
  title={BRAT: Bidirectional Relative Positional Attention Transformer for Event-based Eye tracking},
  author={Wu, Yuliang and Han, Han and Chen, Jinze and Zhai, Wei and Cao, Yang and Zha, Zheng-jun},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={5136--5144},
  year={2025}
}

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Fan Lu, Wei Wu, Kecheng Zheng*, Shuailei Ma, Biao Gong, Jiawei Liu, Wei Zhai*, Yang Cao, Yujun Shen, Zheng-Jun Zha.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025).
abstract / bibtex / code

@article{lu2024benchmarking,
    title={Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning},
    author={Lu, Fan and Wu, Wei and Zheng, Kecheng and Ma, Shuailei and Gong, Biao and Liu, Jiawei and Zhai, Wei and Cao, Yang and Shen, Yujun and Zha, Zheng-Jun},
    journal={arXiv preprint arXiv:2412.08614},
    year={2024}
    }

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
Yawen Shao, Wei Zhai*, Yuhang Yang, Hongchen Luo, Yang Cao, Zheng-Jun Zha.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025).
abstract / bibtex / code

@article{shao2024great,
title={GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding},
author={Shao, Yawen and Zhai, Wei and Yang, Yuhang and Luo, Hongchen and Cao, Yang and Zha, Zheng-Jun},
journal={arXiv preprint arXiv:2411.19626},
year={2024}
}

Improved Video VAE for Latent Video Diffusion Model
Pingyu Wu, Kai Zhu*, Yu Liu, Liming Zhao, Wei Zhai*, Yang Cao, Zheng-Jun Zha.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025).
abstract / bibtex / code

@article{wu2024improved,
title={Improved Video VAE for Latent Video Diffusion Model},
author={Wu, Pingyu and Zhu, Kai and Liu, Yu and Zhao, Liming and Zhai, Wei and Cao, Yang and Zha, Zheng-Jun},
journal={arXiv preprint arXiv:2411.06449},
year={2024}
}

MMAR: Towards Lossless Multi-Modal Auto-Regressive Prababilistic Modeling
Jian Yang, Dacheng Yin, Yizhou Zhou, Fengyun Rao, Wei Zhai, Yang Cao, Zheng-Jun Zha.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025).
abstract / bibtex

@article{yang2024mmar,
    title={MMAR: Towards Lossless Multi-Modal Auto-Regressive Prababilistic Modeling},
    author={Yang, Jian and Yin, Dacheng and Zhou, Yizhou and Rao, Fengyun and Zhai, Wei and Cao, Yang and Zha, Zheng-Jun},
    journal={arXiv preprint arXiv:2410.10798},
    year={2024}
    }

Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning
Kunyu Wang, Xueyang Fu, Xin Lu, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025, Highlight).
abstract / bibtex

2024

EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views
Yuhang Yang, Wei Zhai*, Chengfeng Wang, Chengjun Yu, Yang Cao, Zheng-Jun Zha.
Neural Information Processing Systems (NeurIPS 2024).
abstract / bibtex / code

@article{yang2024egochoir,
    title={EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views},
    author={Yang, Yuhang and Zhai, Wei and Wang, Chengfeng and Yu, Chengjun and Cao, Yang and Zha, Zheng-Jun},
    journal={arXiv preprint arXiv:2405.13659},
    year={2024}
    }

UniDense: Unleashing Diffusion Models with Meta-Routers for Universal Few-Shot Dense Prediction
Lintao Dong, Wei Zhai*, Zheng-Jun Zha.
ACM Multimedia (ACM MM 2024).
abstract / bibtex

Event-based Optical Flow via Transforming into Motion-dependent View
Zengyu Wan, Yang Wang, Wei Zhai*, Ganchao Tan, Yang Cao, Zheng-Jun Zha*.
IEEE Transactions on Image Processing (T-IP).
abstract / bibtex

@article{wan2024event,
    title={Event-based Optical Flow via Transforming into Motion-dependent View},
    author={Wan, Zengyu and Wang, Yang and Wei, Zhai and Tan, Ganchao and Cao, Yang and Zha, Zheng-Jun},
    journal={IEEE Transactions on Image Processing},
    year={2024},
    publisher={IEEE}
    }

Bidirectional Progressive Transformer for Interaction Intention Anticipation
Zichen Zhang, Hongchen Luo, Wei Zhai*, Yang Cao, Yu Kang.
European Conference on Computer Vision (ECCV 2024).
abstract / bibtex / arxiv

@article{zhang2024bidirectional,
    title={Bidirectional Progressive Transformer for Interaction Intention Anticipation},
    author={Zhang, Zichen and Luo, Hongchen and Zhai, Wei and Cao, Yang and Kang, Yu},
    journal={arXiv preprint arXiv:2405.05552},
    year={2024}
    }

Event-based Asynchronous HDR Imaging by Temporal Incident Light Modulation
Yuliang Wu, Ganchao Tan, Jinze Chen, Wei Zhai*, Yang Cao, Zheng-Jun Zha.
Optics Express (OE).
abstract / bibtex

@article{wu2024event,
    title={Event-based asynchronous HDR imaging by temporal incident light modulation},
    author={Wu, Yuliang and Tan, Ganchao and Chen, Jinze and Zhai, Wei and Cao, Yang and Zha, Zheng-Jun},
    journal={Optics Express},
    volume={32},
    number={11},
    pages={18527--18538},
    year={2024},
    publisher={Optica Publishing Group}
    }

Prioritized Local Matching Network for Cross-Category Few-Shot Anomaly Detection
Huilin Deng, Hongchen Luo, Wei Zhai, Yang Cao, Yanming Guo, Yu Kang.
IEEE Artificial Intelligence (T-AI).
abstract / bibtex

@article{deng2024prioritized,
    title={Prioritized Local Matching Network for Cross-Category Few-Shot Anomaly Detection},
    author={Deng, Huilin and Luo, Hongchen and Zhai, Wei and Cao, Yang and Kang, Yu},
    journal={IEEE Transactions on Artificial Intelligence},
    year={2024},
    publisher={IEEE}
    }

LEMON: Learning 3D Human-Object Interaction Relation from 2D Images
Yuhang Yang, Wei Zhai*, Hongchen Luo, Yang Cao, Zheng-Jun Zha.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024).
abstract / bibtex / arxiv / website

@inproceedings{yang2024lemon,
    title={LEMON: Learning 3D Human-Object Interaction Relation from 2D Images},
    author={Yang, Yuhang and Zhai, Wei and Luo, Hongchen and Cao, Yang and Zha, Zheng-Jun},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={16284--16295},
    year={2024}
    }

Mambapupil: Bidirectional Selective Recurrent Model for Event-based Eye Tracking
Zhong Wang, Zengyu Wan, Han Han, Bohao Liao, Yuliang Wu, Wei Zhai*, Yang Cao, Zheng-Jun Zha.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024), Workshop.
Event-based Eye Tracking-AIS2024 CVPR Workshop, 1st Place.
abstract / bibtex

@inproceedings{wang2024mambapupil,
    title={Mambapupil: Bidirectional selective recurrent model for event-based eye tracking},
    author={Wang, Zhong and Wan, Zengyu and Han, Han and Liao, Bohao and Wu, Yuliang and Zhai, Wei and Cao, Yang and Zha, Zheng-jun},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={5762--5770},
    year={2024}
    }

Hypercorrelation Evolution for Video Class-Incremental Learning
Sen Liang, Kai Zhu*, Zhiheng Liu, Wei Zhai*, Yang Cao.
AAAI Conference on Artificial Intelligence (AAAI 2024).
abstract / bibtex

@inproceedings{liang2024hypercorrelation,
    title={Hypercorrelation Evolution for Video Class-Incremental Learning},
    author={Liang, Sen and Zhu, Kai and Zhai, Wei and Liu, Zhiheng and Cao, Yang},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={38},
    number={4},
    pages={3315--3323},
    year={2024}
    }

2023

Grounded Affordance from Exocentric View
Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao.
International Journal of Computer Vision (IJCV).
Journal version of "Learning Affordance Grounding from Exocentric Images" (CVPR 2022)
abstract / bibtex / arxiv / code

@article{luo2024grounded,
    title={Grounded affordance from exocentric view},
    author={Luo, Hongchen and Zhai, Wei and Zhang, Jing and Cao, Yang and Tao, Dacheng},
    journal={International Journal of Computer Vision},
    volume={132},
    number={6},
    pages={1945--1969},
    year={2024},
    publisher={Springer}
    }

On Exploring Multiplicity of Primitives and Attributes for Texture Recognition in the Wild
Wei Zhai, Yang Cao, Jing Zhang, Haiyong Xie, Dacheng Tao, Zheng-Jun Zha.
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI).
Journal version of "Deep Multiple-Attribute-Perceived Network for Real-World Texture Recognition" (ICCV 2019) and "Deep Structure-Revealed Network for Texture Recognition" (CVPR 2020).
abstract / bibtex / code

@article{zhai2023exploring,
    title={On exploring multiplicity of primitives and attributes for texture recognition in the wild},
    author={Zhai, Wei and Cao, Yang and Zhang, Jing and Xie, Haiyong and Tao, Dacheng and Zha, Zheng-Jun},
    journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
    year={2023},
    publisher={IEEE}
    }

Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation
Wei Zhai, Pingyu Wu, Kai Zhu, Yang Cao, Feng Wu, Zheng-Jun Zha.
International Journal of Computer Vision (IJCV).
Journal version of "Background Activation Suppression for Weakly Supervised Object Localization" (CVPR 2022)
abstract / bibtex / arxiv / code

@article{zhai2024background,
    title={Background activation suppression for weakly supervised object localization and semantic segmentation},
    author={Zhai, Wei and Wu, Pingyu and Zhu, Kai and Cao, Yang and Wu, Feng and Zha, Zheng-Jun},
    journal={International Journal of Computer Vision},
    volume={132},
    number={3},
    pages={750--775},
    year={2024},
    publisher={Springer}
    }

Learning Visual Affordance Grounding from Demonstration Videos
Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao.
IEEE Transactions on Neural Networks and Learning Systems (T-NNLS).
abstract / bibtex / arxiv / code

@article{luo2023learning,
    title={Learning visual affordance grounding from demonstration videos},
    author={Luo, Hongchen and Zhai, Wei and Zhang, Jing and Cao, Yang and Tao, Dacheng},
    journal={IEEE Transactions on Neural Networks and Learning Systems},
    year={2023},
    publisher={IEEE}
    }

Spatial-Aware Token for Weakly Supervised Object Localization
Pingyu Wu, Wei Zhai*, Yang Cao, Jiebo Luo and Zheng-Jun Zha.
IEEE/CVF International Conference on Computer Vision (ICCV 2023).
abstract / bibtex / arxiv / code

@inproceedings{wu2023spatial,
    title={Spatial-aware token for weakly supervised object localization},
    author={Wu, Pingyu and Zhai, Wei and Cao, Yang and Luo, Jiebo and Zha, Zheng-Jun},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
    pages={1844--1854},
    year={2023}
    }

Grounding 3D Object Affordance from 2D Interactions in Images
Yuhang Yang, Wei Zhai*, Hongchen Luo, Yang Cao, Jiebo Luo and Zheng-Jun Zha.
IEEE/CVF International Conference on Computer Vision (ICCV 2023).
abstract / bibtex / arxiv / code

@inproceedings{yang2023grounding,
    title={Grounding 3d object affordance from 2d interactions in images},
    author={Yang, Yuhang and Zhai, Wei and Luo, Hongchen and Cao, Yang and Luo, Jiebo and Zha, Zheng-Jun},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
    pages={10905--10915},
    year={2023}
    }

Robustness Benchmark for Unsupervised Anomaly Detection Models
Pei Wang, Wei Zhai, and Yang Cao.
Journal of University of Science and Technology of China (JUSTC).
abstract / bibtex

@article{wang2024robustness,
    title={Robustness benchmark for unsupervised anomaly detection models},
    author={Wang, Pei and Zhai, Wei and Cao, Yang},
    journal={JUSTC},
    volume={54},
    number={1},
    pages={0103--1},
    year={2024},
    publisher={JUSTC}
    }

Leverage Interactive Affinity for Affordance Learning
Hongchen Luo#, Wei Zhai#, Jing Zhang, Yang Cao, and Dacheng Tao.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023).
abstract / bibtex / code

@inproceedings{luo2023leverage,
    title={Leverage interactive affinity for affordance learning},
    author={Luo, Hongchen and Zhai, Wei and Zhang, Jing and Cao, Yang and Tao, Dacheng},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={6809--6819},
    year={2023}
    }

Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection
Fan Lu, Kai Zhu, Wei Zhai, Kecheng Zheng, and Yang Cao.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023).
abstract / bibtex / code

@inproceedings{lu2023uncertainty,
    title={Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection},
    author={Lu, Fan and Zhu, Kai and Zhai, Wei and Zheng, Kecheng and Cao, Yang},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={3282--3291},
    year={2023}
    }

Exploring Tuning Characteristics of Ventral Stream's Neurons for Few-Shot Image Classification
Lintao Dong, Wei Zhai, Zheng-Jun Zha.
AAAI Conference on Artificial Intelligence (AAAI 2023, Oral, Distinguished Paper).
abstract / bibtex

@inproceedings{dong2023exploring,
    title={Exploring Tuning Characteristics of Ventral Stream’s Neurons for Few-Shot Image Classification},
    author={Dong, Lintao and Zhai, Wei and Zha, Zheng-Jun},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={37},
    number={1},
    pages={534--542},
    year={2023}
    }

2022

Exploring Figure-Ground Assignment Mechanism in Perceptual Organization
Wei Zhai, Yang Cao, jing Zhang, Zheng-Jun Zha.
Neural Information Processing Systems (NeurIPS 2022).
abstract / bibtex

@article{zhai2022exploring,
    title={Exploring figure-ground assignment mechanism in perceptual organization},
    author={Zhai, Wei and Cao, Yang and Zhang, Jing and Zha, Zheng-Jun},
    journal={Advances in Neural Information Processing Systems},
    volume={35},
    pages={17030--17042},
    year={2022}
    }

Phrase-Based Affordance Detection via Cyclic Bilateral Interaction
Liangsheng Lu#, Wei Zhai#, Hongchen Luo, Kang Yu, Yang Cao.
IEEE Artificial Intelligence (T-AI).
abstract / bibtex / arxiv / code

@article{lu2022phrase,
    title={Phrase-based affordance detection via cyclic bilateral interaction},
    author={Lu, Liangsheng and Zhai, Wei and Luo, Hongchen and Kang, Yu and Cao, Yang},
    journal={IEEE Transactions on Artificial Intelligence},
    year={2022},
    publisher={IEEE}
    }

One-Shot Affordance Detection in the Wild
Wei Zhai#, Hongchen Luo#, Jing Zhang, Yang Cao, Dacheng Tao.
International Journal of Computer Vision (IJCV).
Journal version of "One-Shot Affordance Detection" (IJCAI 2021)
abstract / bibtex / arxiv / code

@article{zhai2022one,
    title={One-shot object affordance detection in the wild},
    author={Zhai, Wei and Luo, Hongchen and Zhang, Jing and Cao, Yang and Tao, Dacheng},
    journal={International Journal of Computer Vision},
    volume={130},
    number={10},
    pages={2472--2500},
    year={2022},
    publisher={Springer}
    }

Deep Texton-Coherence Network for Camouflaged Object Detection
Wei Zhai, Yang Cao, Haiyong Xie, Zheng-Jun Zha.
IEEE Transactions on Multimedia (T-MM).
abstract / bibtex

@article{zhai2022deep,
    title={Deep texton-coherence network for camouflaged object detection},
    author={Zhai, Wei and Cao, Yang and Xie, HaiYong and Zha, Zheng-Jun},
    journal={IEEE Transactions on Multimedia},
    year={2022},
    publisher={IEEE}
    }

Location-Free Camouflage Generation Network
Yangyang Li#, Wei Zhai#, Yang Cao, Zheng-Jun Zha.
IEEE Transactions on Multimedia (T-MM).
abstract / bibtex / arxiv / code

@article{li2022location,
    title={Location-free camouflage generation network},
    author={Li, Yangyang and Zhai, Wei and Cao, Yang and Zha, Zheng-jun},
    journal={IEEE Transactions on Multimedia},
    year={2022},
    publisher={IEEE}
    }

Learning Affordance Grounding from Exocentric Images
Hongchen Luo#, Wei Zhai#, Jing Zhang, Yang Cao, and Dacheng Tao.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022).
abstract / bibtex / arxiv / code

@inproceedings{luo2022learning,
    title={Learning affordance grounding from exocentric images},
    author={Luo, Hongchen and Zhai, Wei and Zhang, Jing and Cao, Yang and Tao, Dacheng},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={2252--2261},
    year={2022}
    }

Background Activation Suppression for Weakly Supervised Object Localization
Pingyu Wu#, Wei Zhai#, Yang Cao.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022).
abstract / bibtex / arxiv / code

@inproceedings{wu2022background,
    title={Background activation suppression for weakly supervised object localization},
    author={Wu, Pingyu and Zhai, Wei and Cao, Yang},
    booktitle={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    pages={14228--14237},
    year={2022},
    organization={IEEE}
    }

Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learnings
Kai Zhu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022).
abstract / bibtex / arxiv / code

@inproceedings{zhu2022self,
    title={Self-sustaining representation expansion for non-exemplar class-incremental learning},
    author={Zhu, Kai and Zhai, Wei and Cao, Yang and Luo, Jiebo and Zha, Zheng-Jun},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={9296--9305},
    year={2022}
    }

Robust Object Detection via Adversarial Novel Style Exploration
Wen Wang, Jing Zhang, Wei Zhai, Yang Cao, Dacheng Tao.
IEEE Transactions on Image Processing (T-IP).
abstract / bibtex

@article{wang2022robust,
    title={Robust object detection via adversarial novel style exploration},
    author={Wang, Wen and Zhang, Jing and Zhai, Wei and Cao, Yang and Tao, Dacheng},
    journal={IEEE Transactions on Image Processing},
    volume={31},
    pages={1949--1962},
    year={2022},
    publisher={IEEE}
    }

2021

Robust Object Detection via Adversarial Novel Style Exploration
Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao.
International Joint Conferences on Artificial Intelligence Organization (IJCAI 2021, Oral).
abstract / bibtex / arxiv / code

@article{luo2021one,
    title={One-shot affordance detection},
    author={Luo, Hongchen and Zhai, Wei and Zhang, Jing and Cao, Yang and Tao, Dacheng},
    journal={arXiv preprint arXiv:2106.14747},
    year={2021}
    }

A Tri-Attention Enhanced Graph Convolutional Network for Skeleton-Based Action Recognition
Xingming Li, Wei Zhai, Yang Cao.
IET Computer Vision (IET-CV 2021).
abstract / bibtex

@article{li2021tri,
    title={A tri-attention enhanced graph convolutional network for skeleton-based action recognition},
    author={Li, Xingming and Zhai, Wei and Cao, Yang},
    journal={IET Computer Vision},
    volume={15},
    number={2},
    pages={110--121},
    year={2021},
    publisher={Wiley Online Library}
    }

Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning
Kai Zhu, Yang Cao, Wei Zhai, Jie Cheng, Zheng-Jun Zha.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021).
abstract / bibtex / arxiv / code

@inproceedings{zhu2021self,
    title={Self-promoted prototype refinement for few-shot class-incremental learning},
    author={Zhu, Kai and Cao, Yang and Zhai, Wei and Cheng, Jie and Zha, Zheng-Jun},
    booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
    pages={6801--6810},
    year={2021}
    }

2020

Self-Supervised Tuning for Few-Shot Segmentation
Kai Zhu, Wei Zhai, Yang Cao.
International Joint Conferences on Artificial Intelligence Organization (IJCAI 2020, Oral).
abstract / bibtex

@inproceedings{zhu2021self,
    title={Self-supervised tuning for few-shot segmentation},
    author={Zhu, Kai and Zhai, Wei and Cao, Yang},
    booktitle={Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence},
    pages={1019--1025},
    year={2021}
    }

Deep Inhomogeneous Regularization for Transfer Learning
Wen Wang, Wei Zhai, Yang Cao.
IEEE International Conference on Image Processing (ICIP 2020).
abstract / bibtex

@inproceedings{wang2020deep,
    title={Deep inhomogeneous regularization for transfer learning},
    author={Wang, Wen and Zhai, Wei and Cao, Yang},
    
    booktitle={2020 IEEE International Conference on Image Processing (ICIP)},
    pages={221--225},
    year={2020},
    organization={IEEE}
    }

Deep Structure-Revealed Network for Texture Recognition
Wei Zhai, Yang Cao, Zheng-Jun Zha, HaiYong Xie, Feng Wu.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020, Oral).
abstract / bibtex

@inproceedings{zhai2020deep,
    title={Deep structure-revealed network for texture recognition},
    author={Zhai, Wei and Cao, Yang and Zha, Zheng-Jun and Xie, HaiYong and Wu, Feng},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={11010--11019},
    year={2020}
    }

One-Shot Texture Retrieval Using Global Grouping Metric
Kai Zhu, Yang Cao, Wei Zhai, Zheng-Jun Zha.
IEEE Transactions on Multimedia (T-MM 2020).
Journal version of "One-Shot Texture Retrieval with Global Context Metric" (IJCAI 2019)
abstract / bibtex

@article{zhu2020one,
    title={One-shot texture retrieval using global grouping metric},
    author={Zhu, Kai and Cao, Yang and Zhai, Wei and Zha, Zheng-Jun},
    journal={IEEE Transactions on Multimedia},
    volume={23},
    pages={3726--3737},
    year={2020},
    publisher={IEEE}
    }

2019

Deep Multiple-Attribute-Perceived Network for Real-World Texture Recognition
Wei Zhai, Yang Cao, Jing Zhang, Zheng-Jun Zha.
IEEE/CVF International Conference on Computer Vision (ICCV 2019).
abstract / bibtex

@inproceedings{zhai2019deep,
    title={Deep multiple-attribute-perceived network for real-world texture recognition},
    author={Zhai, Wei and Cao, Yang and Zhang, Jing and Zha, Zheng-Jun},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
    pages={3613--3622},
    year={2019}
    }

One-Shot Texture Retrieval with Global Context Metric
Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao.
International Joint Conferences on Artificial Intelligence Organization (IJCAI 2019, Oral).
abstract / bibtex

@article{zhu2019one,
    title={One-shot texture retrieval with global context metric},
    author={Zhu, Kai and Zhai, Wei and Zha, Zheng-Jun and Cao, Yang},
    journal={arXiv preprint arXiv:1905.06656},
    year={2019}
    }

PixTextGAN: Structure Aware Text Image Synthesis for License Plate Recognition
Shilian Wu, Wei Zhai, Yang Cao.
IET Image Processing (IET-IP 2019).
abstract / bibtex

@article{wu2019pixtextgan,
    title={PixTextGAN: structure aware text image synthesis for license plate recognition},
    author={Wu, Shilian and Zhai, Wei and Cao, Yang},
    journal={IET Image Processing},
    volume={13},
    number={14},
    pages={2744--2752},
    year={2019},
    publisher={Wiley Online Library}
    }

2018

A Generative Adversarial Network Based Framework for Unsupervised Visual Surface Inspection
Wei Zhai, Jiang Zhu, Yang Cao, Zengfu Wang.
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018, Oral).
abstract / bibtex

@inproceedings{zhai2018generative,
    title={A generative adversarial network based framework for unsupervised visual surface inspection},
    author={Zhai, Wei and Zhu, Jiang and Cao, Yang and Wang, Zengfu},
    booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    pages={1283--1287},
    year={2018},
    organization={IEEE}
    }

Co-Occurrent Structural Edge Detection for Color-Guided Depth Map Super-Resolution
Jiang Zhu, Wei Zhai, Yang Cao, Zheng-Jun Zha.
International Conference on Multimedia Modeling (MMM 2018, Oral).
abstract / bibtex

@inproceedings{zhu2018co,
    title={Co-occurrent structural edge detection for color-guided depth map super-resolution},
    author={Zhu, Jiang and Zhai, Wei and Cao, Yang and Zha, Zheng-Jun},
    booktitle={MultiMedia Modeling: 24th International Conference, MMM 2018, Bangkok, Thailand, February 5-7, 2018, Proceedings, Part I 24},
    pages={93--105},
    year={2018},
    organization={Springer}
    }

Pre-prints

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models
Zixun Fang, Kai Zhu, Zhiheng Liu, Yu Liu, Wei Zhai, Yang Cao, Zheng-Jun Zha.
Arxiv.
abstract / bibtex / demo

@article{fang2025panoramic,
  title={Panoramic Video Generation with Pretrained Diffusion Models},
  author={Fang, Zixun and Zhu, Kai and Liu, Zhiheng and Liu, Yu and Zhai, Wei and Cao, Yang and Zha, Zheng-Jun},
  journal={arXiv preprint arXiv:2506.23513},
  year={2025}
}

AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model
Pingyu Wu, Kai Zhu, Yu Liu, Longxiang Tang, Jian Yang, Yansong Peng, Wei Zhai, Yang Cao, Zheng-Jun Zha.
Arxiv.
abstract / bibtex / code

@article{wu2025alitok,
  title={AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model},
  author={Wu, Pingyu and Zhu, Kai and Liu, Yu and Tang, Longxiang and Yang, Jian and Peng, Yansong and Zhai, Wei and Cao, Yang and Zha, Zheng-Jun},
  journal={arXiv preprint arXiv:2506.05289},
  year={2025}
}

VideoGen-Eval: Agent-based System for Video Generation Evaluation
Yuhang Yang, Shangkun Sun, Hongxiang Li, Ke Fan, Ailing Zeng, Feilin Han, Wei Zhai, Wei Liu, Yang Cao, Zheng-Jun Zha.
Arxiv.
abstract / bibtex / code

@article{yang2025videogen,
  title={VideoGen-Eval: Agent-based System for Video Generation Evaluation},
  author={Yang, Yuhang and Fan, Ke and Sun, Shangkun and Li, Hongxiang and Zeng, Ailing and Han, FeiLin and Zhai, Wei and Liu, Wei and Cao, Yang and Zha, Zheng-Jun},
  journal={arXiv preprint arXiv:2503.23452},
  year={2025}
}

VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization
Zixun Fang, Zhiheng Liu, Kai Zhu, Yu Liu, Ka Leong Cheng, Wei Zhai, Yang Cao, Zheng-Jun Zha
Arxiv.
abstract / bibtex / code

@misc{fang2025vangoghunifiedmultimodaldiffusionbased,
    title={VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization}, 
    author={Zixun Fang and Zhiheng Liu and Kai Zhu and Yu Liu and Ka Leong Cheng and Wei Zhai and Yang Cao and Zheng-Jun Zha},
    year={2025},
    eprint={2501.09499},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2501.09499}, 
}

Learning Object Affordance Ranking with Task Context
Haojie Huang, Hongchen Luo, Wei Zhai, Yang Cao, Zheng-Jun Zha.
Arxiv.
abstract / bibtex / code

@article{huang2024leverage,
title={Leverage Task Context for Object Affordance Ranking},
author={Huang, Haojie and Luo, Hongchen and Zhai, Wei and Cao, Yang and Zha, Zheng-Jun},
journal={arXiv preprint arXiv:2411.16082},
year={2024}
}

Event Signal Filtering via Probability Flux Estimation
Jinze Chen, Wei Zhai, Yang Cao, Bin Li, Zheng-Jun Zha.
Arxiv.
abstract / bibtex

@article{chen2025event,
title={Event Signal Filtering via Probability Flux Estimation},
author={Chen, Jinze and Zhai, Wei and Cao, Yang and Li, Bin and Zha, Zheng-Jun},
journal={arXiv preprint arXiv:2504.07503},
year={2025}
}

EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting
Bohao Liao, Wei Zhai, Zengyu Wan, Zhixin Cheng, Wenfei Yang, Yang Cao, Tianzhu Zhang, Zheng-Jun Zha.
Arxiv.
abstract / bibtex / code

@article{liao2024ef,
title={EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting},
author={Liao, Bohao and Zhai, Wei and Wan, Zengyu and Zhang, Tianzhu and Cao, Yang and Zha, Zheng-Jun},
journal={arXiv preprint arXiv:2410.15392},
year={2024}
}

Visual-Geometric Collaborative Guidance for Affordance Learning
Hongchen Luo, Wei Zhai, Jiao Wang, Yang Cao, Zheng-Jun Zha.
Arxiv.
Journal version of "Leverage Interactive Affinity for Affordance Learning" (CVPR 2023)
abstract / bibtex / code

@article{luo2024visual,
    title={Visual-Geometric Collaborative Guidance for Affordance Learning},
    author={Luo, Hongchen and Zhai, Wei and Wang, Jiao and Cao, Yang and Zha, Zheng-Jun},
    journal={arXiv preprint arXiv:2410.11363},
    year={2024}
    }

VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection
Huilin Deng, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang.
Arxiv.
abstract / bibtex

@article{deng2024vmad,
    title={VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection},
    author={Deng, Huilin and Luo, Hongchen and Zhai, Wei and Cao, Yang and Kang, Yu},
    journal={arXiv preprint arXiv:2409.20146},
    year={2024}
    }

Grounding 3D Scene Affordance From Egocentric Interactions
Cuiyu Liu, Wei Zhai, Yuhang Yang, Hongchen Luo, Sen Liang, Yang Cao, Zheng-Jun Zha.
Arxiv.
abstract / bibtex

@article{liu2024grounding,
    title={Grounding 3D Scene Affordance From Egocentric Interactions},
    author={Liu, Cuiyu and Zhai, Wei and Yang, Yuhang and Luo, Hongchen and Liang, Sen and Cao, Yang and Zha, Zheng-Jun},
    journal={arXiv preprint arXiv:2409.19650},
    year={2024}
    }

ViViD: Video Virtual Try-on using Diffusion Models
Zixun Fang, Wei Zhai, Aimin Su, Hongliang Song, Kai Zhu, Mao Wang, Yu Chen, Zhiheng Liu, Yang Cao, Zheng-Jun Zha.
Arxiv.
abstract / bibtex / code

@article{fang2024vivid,
    title={ViViD: Video Virtual Try-on using Diffusion Models},
    author={Fang, Zixun and Zhai, Wei and Su, Aimin and Song, Hongliang and Zhu, Kai and Wang, Mao and Chen, Yu and Liu, Zhiheng and Cao, Yang and Zha, Zheng-Jun},
    journal={arXiv preprint arXiv:2405.11794},
    year={2024}
    }

Intention-driven Ego-to-Exo Video Generation
Hongchen Luo, Kai Zhu, Wei Zhai, Yang Cao.
Arxiv.
abstract / bibtex

@article{luo2024intention,
    title={Intention-driven Ego-to-Exo Video Generation},
    author={Luo, Hongchen and Zhu, Kai and Zhai, Wei and Cao, Yang},
    journal={arXiv preprint arXiv:2403.09194},
    year={2024}
    }

Likelihood-Aware Semantic Alignment for Full-Spectrum Out-of-Distribution Detection
Fan Lu, Kai Zhu, Kecheng Zheng, Wei Zhai, Yang Cao.
Arxiv.
abstract / bibtex / code

@article{lu2023likelihood,
    title={Likelihood-Aware Semantic Alignment for Full-Spectrum Out-of-Distribution Detection},
    author={Lu, Fan and Zhu, Kai and Zheng, Kecheng and Zhai, Wei and Cao, Yang},
    journal={arXiv preprint arXiv:2312.01732},
    year={2023}
    }

Professional Activities

Conference Reviewer:

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
IEEE International Conference on Computer Vision (ICCV)
European Conference on Computer Vision (ECCV)
Neural Information Processing Systems (NeurIPS)
International Conference on Learning Representations (ICLR)
International Conference on Machine Learning (ICML)
AAAI Conference on Artificial Intelligence (AAAI)
ACM Multimedia (ACM MM)
International Joint Conferences on Artificial Intelligence Organization (IJCAI)

Journal Reviewer:

IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
International Journal of Computer Vision (IJCV)
IEEE Transactions on Image Processing (T-IP)
IEEE Transactions on Neural Networks and Learning Systems (T-NNLS)
IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT)
IEEE Transactions on Multimedia (T-MM)
Pattern Recognition (PR)
ACM Transactions on Multimedia Computing, Communications, and Applications (ToMM)

Awards and Honors

Body Contact Estimation Challenge-RHOBIN2025 CVPR Workshop, 1st Place, 2025
Efficient Event-based Eye-Tracking Challenge CVPR Workshop, 1st Place, 2025
3D Contact Estimation Challenge-RHOBIN2024 CVPR Workshop, 2nd Place, 2024
Event-based Eye Tracking-AIS2024 CVPR Workshop, 1st Place, 2024
NTIRE 2024 Efficient Super-Resolution Challenge, 2nd Place, 2024
AAAI Distinguished Paper, 2023
Outstanding Internship at JD Eeplore Academy, 2021
National Scholarship (University of Science and Technology of China), 2019
Outstanding Graduate of Southwest Jiaotong University, 2017
National Scholarship (Southwest Jiaotong University), 2016

Teaching Assistants

Image Processing. (Autumn, 2019)
Computer Vision. (Autumn, 2020)

Website adapted from Saurabh Gupta