Abstract: The aim of the violent recognition task is to determine whether a video contains violent behaviors. Given that violent behavior often comes with visual and audio anomalies, multimodal ...
🕹️ Try and Play with VAR! We provide a demo website for you to play with VAR models and generate images interactively. Enjoy the fun of visual autoregressive modeling! We provide a demo website for ...
Abstract: Affective Video Facial Analysis (AVFA) is important for advancing emotion-aware AI, yet the persistent data scarcity in AVFA presents challenges. Recently, the self-supervised learning (SSL) ...
Abstract: Weakly-supervised audio-visual video parsing (WS-AVVP) aims to localize the temporal extents of audio, visual and audio-visual event instances as well as identify the corresponding event ...