详细内容或原文请订阅后点击阅览
了解Mamba中的输入选择性
State-Space Models (SSMs), and particularly Mamba, have recently emerged as a promising alternative to Transformers.Mamba introduces input selectivity to its SSM layer (S6) andincorporates convolution and gating into its block definition.While these modifications do improve Mamba's performance over its SSM predecessors, it remains largely unclear how Mamba leverages the additional functionalities provided by input selectivity, and how这些与Mamba体系结构中的其他操作进行了互动。在这项工作中,我们揭示了输入选择性在Mamba中的作用,研究了其…
来源:Apple机器学习研究State-Space Models (SSMs), and particularly Mamba, have recently emerged as a promising alternative to Transformers.Mamba introduces input selectivity to its SSM layer (S6) andincorporates convolution and gating into its block definition.While these modifications do improve Mamba's performance over its SSM predecessors, it remains largely unclear how Mamba leverages the additional functionalities provided by input selectivity, and how这些与Mamba体系结构中的其他操作相互作用。在这项工作中,我们揭示了输入选择性在Mamba中的作用,调查了其对功能近似能力,长期记忆和关联召回能力的影响,尤其是:(i)我们证明,我们可以证明Mamba的S6层可以代表其对Haar波浪的投影,而SSS的偏远人物则是SS的依次,而S4则是SS的依次。在实践中通常会出现不连续的功能; (ii)我们展示了S6层如何动态抵抗内存衰减; (iii)我们使用带有不同混音器的MAMBA体系结构为MQAR关联召回任务提供分析解决方案--- Mamba,Mamba-2和S4D。我们通过对具体任务的经验结果来证明我们的理论结构的紧密性。我们的发现提供了对Mamba的机械理解,并揭示了改进的机会。
- •在Apple†Flatiron Institute§MilaResearch Institute完成的工作