A neural network architecture that uses attention mechanisms to process relationships across sequences of data.
The transformer is the architecture behind most modern large language models. It allows the model to attend to different parts of the input simultaneously rather than processing text only one token at a time in a simple linear pass.
This architecture made major advances in language modeling possible because it handles context and long-range relationships more effectively than many earlier approaches. It is a foundational concept for understanding how contemporary LLMs work.
In glossary pages about attention, context windows, or large language models, transformer is the underlying technical concept connecting them.
Transformers commonly rely on stacked attention layers, token embeddings, and positional information to model how each token relates to the rest of the sequence.