• Basic units of text/code LLM uses to process or generate language • Can be characters, words, subwords, segments of text or code • Tokens generally = ~4 characters of text for common English • ¾ of a word – 100 tokens ~= 75 words • GPT models process text using tokens • Common sequences of characters found in text • Understands the statistical relationships between these tokens • Used to predict next token in a sequence of tokens