[CVPR 2024] Question Aware Vision Transformer for Multimodal Reasoning

[CVPR 2024] Question Aware Vision Transformer for Multimodal Reasoning

[CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other ModalitiesПодробнее

[CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Making Vision Transformers Truly Shift-Equivariant (CVPR 2024)Подробнее

Making Vision Transformers Truly Shift-Equivariant (CVPR 2024)

[CVPR 2024] Situational Awareness Matters in 3D Vision Language ReasoningПодробнее

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

[CVPR 2024] TransNeXt: Robust Foveal Visual Perception for Vision TransformersПодробнее

[CVPR 2024] TransNeXt: Robust Foveal Visual Perception for Vision Transformers

Multimodal Token Fusion for Vision Transformers | CVPR 2022Подробнее

Multimodal Token Fusion for Vision Transformers | CVPR 2022

[CVPR'24] MoReVQA: Exploring Modular Reasoning Models for Video Question AnsweringПодробнее

[CVPR'24] MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Transformer for Vision | Multimodal Transformers for Video | Session 7 | CVPR 2022Подробнее

Transformer for Vision | Multimodal Transformers for Video | Session 7 | CVPR 2022

Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring (CVPR 24)Подробнее

Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring (CVPR 24)

[CVPR 2024] Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory PredictionПодробнее

[CVPR 2024] Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction

Dense Vision Transformer Compression with Few Samples | CVPR 2024Подробнее

Dense Vision Transformer Compression with Few Samples | CVPR 2024

[CVPR'24] Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision TransformerПодробнее

[CVPR'24] Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

[NeurIPS 2021] History-Aware Multimodal Transformer for Vision-and-Language NavigationПодробнее

[NeurIPS 2021] History-Aware Multimodal Transformer for Vision-and-Language Navigation

Efficient Test-Time Adaptation of Vision-Language Models [CVPR 2024]Подробнее

Efficient Test-Time Adaptation of Vision-Language Models [CVPR 2024]

[CVPR'24] On the Faithfulness of Vision Transformer ExplanationsПодробнее

[CVPR'24] On the Faithfulness of Vision Transformer Explanations

Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers (CVPR 2023)Подробнее

Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers (CVPR 2023)

[CVPR 2024] Depth-aware Test-Time Training for Zero-shot Video Object SegmentationПодробнее

[CVPR 2024] Depth-aware Test-Time Training for Zero-shot Video Object Segmentation

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach, CVPR 2024Подробнее

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach, CVPR 2024

Актуальное