使用Amazon Nova

在这篇文章中,我们演示了如何使用Amazon Nova,Amazon Rekognition和Amazon Polly之类的服务来自动创建视频内容的可访问音频描述。这种方法可以大大减少使视觉障碍受众访问视频所需的时间和成本。

来源:亚马逊云科技 _机器学习
根据世界卫生组织的说法,全球有22亿多人的视力障碍。为了遵守残疾立法,例如美国的《美国残疾人法》(ADA),需要视觉表演或电影等视觉格式的媒体才能为视力受损的人提供可访问性。这通常以音频描述的形式出现,轨迹叙述了电影或节目的视觉元素。根据国际纪录片协会的说法,使用第三方时,创建音频描述的价格为每分钟25美元(或更多)。对于内部构建音频描述,媒体行业的企业的努力可能很重要,需要内容创建者,音频描述作者,描述叙述者,音频工程师,交付供应商等等。 This leads to the natural question, can you automate this process with the help of generative AI offerings in Amazon Web Services (AWS)?Newly announced in December at re:Invent 2024, the Amazon Nova Foundation Models family is available through Amazon Bedrock and includes three multimodal foundational models (FMs):Amazon Nova Lite (GA) – A low-cost multimodal model that’s lightning-fast for processing image, video, and text inputsAmazon Nova Pro (GA) – A highly capable multimodal model with a balanced combination of accuracy, speed, and cost for a wide range of tasksAmazon Nova Premier (GA) – Our most capable model for complex tasks and a teacher for model distillationIn this post, we demonstrate how you can use services like Amazon Nova, Amazon Rekognition, and Amazon Polly to automate the creation of accessible audio descriptions for video content.这种方法可以大大减少使视觉障碍受众访问视频所需的时间和成本。但是,这篇文章没有提供完整的部署就绪解决方案。我们在序列中共享伪代码片段和指导