Visual Basic Videotutorial

Violent Video Recognition Based on Global-Local Visual and Audio Contrastive Learning

Abstract: The aim of the violent recognition task is to determine whether a video contains violent behaviors. Given that violent behavior often comes with visual and audio anomalies, multimodal ...

GitHub

VAR: a new visual generation method elevates GPT-style models beyond diffusion & Scaling laws observed

🕹️ Try and Play with VAR! We provide a demo website for you to play with VAR models and generate images interactively. Enjoy the fun of visual autoregressive modeling! We provide a demo website for ...

GitHub

AVF-MAE++ : Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning

Abstract: Affective Video Facial Analysis (AVFA) is important for advancing emotion-aware AI, yet the persistent data scarcity in AVFA presents challenges. Recently, the self-supervised learning (SSL) ...

IEEE

Multimodal Imbalance-Aware Gradient Modulation for Weakly-Supervised Audio-Visual Video Parsing

Abstract: Weakly-supervised audio-visual video parsing (WS-AVVP) aims to localize the temporal extents of audio, visual and audio-visual event instances as well as identify the corresponding event ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results