Recurrent领域信息情报检索---XiaoMi-AI

2024年12月18日 00:00

使用 ReDrafter 加速 NVIDIA GPU 上的 LLM 推理

Accelerating LLM Inference on NVIDIA GPUs with ReDrafter

加速 LLM 推理是一个重要的 ML 研究问题，因为自回归 token 生成计算成本高且相对较慢，而提高推理效率可以减少用户的延迟。除了持续努力加速 Apple 芯片上的推理之外，我们最近在加速 NVIDIA GPU 的 LLM 推理方面取得了重大进展，该 GPU 广泛用于整个行业的生产应用程序。今年早些时候，我们发布并开源了 Recurrent Drafter (ReDrafter)，这是一种新颖的推测解码方法，达到了最先进的水平……

Apple机器学习研究

2024年11月18日 00:00

用于大型语言模型中快速推测解码的循环起草器

Recurrent Drafter for Fast Speculative Decoding in Large Language Models

我们提出了 Recurrent Drafter (ReDrafter)，这是一种先进的推测解码方法，可实现大型语言模型 (LLM) 推理的最先进的加速。性能提升由三个关键方面推动：(1) 利用循环神经网络 (RNN) 作为 LLM 隐藏状态的草稿模型条件，(2) 对波束搜索结果应用动态树注意算法以消除候选序列中的重复前缀，以及 (3) 通过 LLM 中的知识提炼进行训练。ReDrafter 使用 PyTorch 将 MT-Bench 中的 Vicuna 推理速度提高了 3.5 倍……

计算智能

2023年7月24日 06:31

IEEE 神经网络和学习系统学报，第 35 卷，第 6 期，2024 年 6 月

IEEE Transactions on Neural Networks and Learning Systems, Volume 35, Issue 6, June 2024

1) Editorial Special Issue on Explainable and Generalizable Deep Learning for Medical ImagingAuthor(s): Tianming Liu, Dajiang Zhu, Fei Wang, Islem Rekik, Xia Hu, Dinggang ShenPages: 7271 - 72742) Adversarial Learning Based Node-Edge Graph Attention Networks for Autism Spectrum Disorder Identificatio

Recurrent关键词检索结果

使用 ReDrafter 加速 NVIDIA GPU 上的 LLM 推理

用于大型语言模型中快速推测解码的循环起草器

IEEE 神经网络和学习系统学报，第 35 卷，第 6 期，2024 年 6 月