MENSA Multi Dataset Harmonized Pretraining for Semantic Segmentation
CLIP Based Modality Compensation for Visible Infrared Image Re Identification
Focus Entirety and Perceive Environment for Arbitrary Shaped Text Detection
DIP Diffusion Learning of Inconsistency Pattern for General DeepFake Detection
VLAB Enhancing Video Language Pretraining by Feature Adapting and Blending
Improving Vision Anomaly Detection With the Guidance of Language Modality
Adaptive Fusion Learning for Compositional Zero Shot Recognition
Uncertainty Guided Progressive Few Shot Learning Perception for Aerial View Synt...
FER Former Multimodal Transformer for Facial Expression Recognition
FasterSal Robust and Real Time Single Stream Architecture for RGB D Salient Obje...
PMMTalk Speech Driven 3D Facial Animation From Complementary Pseudo Multi Modal ...
StyleAM Perception Oriented Unsupervised Domain Adaption for No Reference Image ...
VB KGN Variational Bayesian Kernel Generation Networks for Motion Image Deblurring
Dual Stream Relation Learning Network for Image Text Retrieval
Multi Perspective Pseudo Label Generation and Confidence Weighted Training for S...
Auxiliary Representation Guided Network for Visible Infrared Person Re Identific...