Models that generate text one token at a time, using previous tokens to predict the next.
Autoregressive models generate text sequentially, where each new token is predicted based on all previously generated tokens. This approach allows the model to maintain coherence and context throughout the generation process.
Key characteristics:
Autoregressive models use probability distributions over the vocabulary to predict the next token, often with techniques like temperature sampling and top-k sampling to control randomness.