Amazon Sagemaker HyperPod增强了具有可伸缩性和可定制性

在这篇文章中,我们在SageMaker Hyperpod中介绍了三个功能,可提高ML基础架构的可扩展性和可定制性。连续供应提供灵活的资源供应,以帮助您更快地开始培训和部署模型,并更有效地管理群集。使用自定义AMIS,您可以将ML环境与组织安全标准和软件要求保持一致。

来源:亚马逊云科技 _机器学习
Amazon Sagemaker Hyperpod是一种专门建立的基础架构,用于优化基础模型(FM)培训和规模推断。 SageMaker HyperPod removes the undifferentiated heavy lifting involved in building and optimizing machine learning (ML) infrastructure for training FMs, reducing training time by up to 40%.SageMaker HyperPod offers persistent clusters with built-in resiliency, while also offering deep infrastructure control by allowing users to SSH into the underlying Amazon Elastic Compute Cloud (Amazon EC2) instances.它有助于有效地扩展建模开发和部署任务,例如培训,微调或推断数百或数千个AI加速器的群体,同时减少了管理此类集群的运营重型工作。随着AI朝向部署采用众多域和用例时,灵活性和控制的需求变得越来越相关。大型企业希望确保GPU集群遵循组织范围的政策和安全规则。 Mission-critical AI/ML workloads often require specialized environments that align with the organization’s software stack and operational standards.SageMaker HyperPod supports Amazon Elastic Kubernetes Service (Amazon EKS) and offers two new features that enhance this control and flexibility to enable production deployment of large-scale ML workloads:Continuous provisioning – SageMaker HyperPod now supports continuous provisioning, which enhances cluster通过部分配置,滚动更新,并发缩放操作以及在启动和配置高方群集时连续重试的功能,可扩展性。Customamis - 您现在可以使用自定义的Amazon Machine Images(AMIS),可以实现软件堆栈,安全代理,安全代理和专有依赖性的预先配置,这些依赖性依赖于Post-post-port-lost-laaunch bootstapping bootstapping boots bootstapping bootstapppate bootstapppate。客户可以使用HyperPod Public AMI作为基础创建自定义AMI,并且可以创建Insta