MINIMAL关键词检索结果

超越文本压缩:评估跨量表的引物

Beyond Text Compression: Evaluating Tokenizers Across Scales

令牌设计师的设计显着影响语言模型性能,但是评估令牌质量仍然具有挑战性。尽管文本压缩已成为一种常见的内在度量,但最近的工作质疑其作为质量指标的可靠性。 We investigate whether evaluating tokenizers on smaller models (350M parameters) reliably predicts their impact at larger scales (2.7B parameters).Through experiments with established tokenizers from widely-adopted language m