能够与人一起观看视频并通过各种对话分享对视频内容的同理心的人工智能代理是人们期待的有前途的人工智能应用。为此,人工智能代理必须准确地感知和识别视频内容,并基于对内容的理解与人进行自然的多轮对话。最近,人们积极开展文本到视频检索、视频字幕和视频问答 (videoQA) 方面的研究,以提高视频理解智能。此外,已经建立了大规模数据集并公开提供以促进研究 (Alamri 等人 2019 年;Lei 等人 2018 年、2020 年;Choi 等人 2021 年)。使用这些数据集的研究通常应用自动评估指标来衡量人工智能代理的性能。对于视频问答任务,多项选择题通常使用总体准确率,而开放式问答则采用自然语言生成任务中经常使用的评估指标(例如 BLEU(Papineni 等人 2002)、METEOR(Banerjee 和 Lavie 2005)、CIDEr(Vedantam、Lawrence Zitnick 和 Parikh 2015))。这些自动评估指标应用起来很方便,但也有局限性。例如,总体准确率虽然直观且易于计算,但并未考虑问题的难度或所需的认知成分。此外,语言生成模型的评估指标分数无法判断内容是否是问题的正确答案。
这项工作旨在在教学计划视频的背景下特别了解VideoQa的快速新兴领域。它还鼓励设计可以引起基于编程的自然语言问题的系统的设计。We introduce two datasets: Code- VidQA, with 2,104 question-answer pair with timestamps and links taken from programming videos extracted using Stack Overflow for Pro- gramming Visual Answer Localization task, and CodeVidCL with 4,291 videos (1751 pro- gramming, 2540 non-programming) for Pro- gramming Video Classification task.在广告中,我们提出了一个框架,该框架适应了Bigbird和SVM进行视频分类技术。所提出的方法实现了视频分类的奇特精度为99.61%。
ICLR 2025交织的场景图,用于交织的文本和图像生成评估。Dongping Chen,Ruoxi Chen,Shu Pu,Zhaoyi Liu,Yanru Wu,Caixi Chen,Caixi Chen,Benlin Liu,Yue Huang,Yao Wan,Pan Zhou,Ranjay Krishna International International In In Machine Learning,Machine Learning,2025 ICLR 2025 ICLR 2025 AHA:一个视觉语言的人,以实现失败的竞争,并合理地覆盖了竞争者,并合理地覆盖了杂物。众包工作流的技术。Madeleine Grunde-McLaughlin,Michelle S. Lam,Ranjay Krishna,Daniel S. Weld,Je Q rey Heer Heer ACM ACM Transactions on Computer-Human互动Neurips Neurips Neurips 2024 Dist Me Night Me。Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna Advances in neural information processing systems, 2024 NeurIPS 2024 Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models .Yushi Hu*,Weijia Shi*,Xingyu Fu,Dan Roth,Mari Ostendorf,Luke Zettlemoyer,Noah A Smith*,Ranjay Krishna*神经信息处理系统的进步,2024年Neurips 2024 Neurips 2024多语言多样性多样性多样性的多样性改善视觉语言表现。Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna* Advances in neural information processing systems, 2024 Spotlight Paper award (awarded to top 5%) NeurIPS 2024 The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Per- forms Better .Scott Geng,Cheng-Yu Hsieh,Vivek Ramanujan,Matthew Wallingford,Chun-Liang Li,Pang Wei Koh*,Ranjay Krishna*神经信息处理系统的进步,2024 Neurips,Neurips 2024 2024 ActionAtlas:Actionatlas:a Videoqa-benchmark for Videoqa Benchmark for-Frain grave grave grave vrained Capention conterition。Mohammadreza Salehi, Jae Sung Park, Aditya Kusupati, Ranjay Krishna , Yejin Choi, Hannaneh Hajishirzi, Ali Farhadi Advances in neural information processing systems, 2024 NeurIPS 2024 NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples .Wenxuan Peng,Baiqi Li,Zhiqiu Lin,Jean de Dieu Nyandwi,Zixian MA,Simran Khanuja,Deva Ramanan,Ranjay Krishna,Graham Neubig在神经信息处理系统中的进步,2024 Neurips 2024 Neurips 2024 Neurips 2024 Superpuse Supperections singleferess singleferess inderfection in Deciatsions nicledere nitferations in Deciatsions niclederiate bulyse nitferiations in Deciatsions anderfelions in Deciatsions:多个世代。Ethan Shen,Alan Fan,Sarah M Pratt,Jae Sung Park,Matthew Wallingford,Sham M Kakade,Ari Holtzman,Ari Holtzman,Ranjay Krishna,Ali Farhadi,Aditya Kusupati在神经信息处理系统中的进步,2024