SP
BravenNow
Transformer (deep learning)
🌐 Entity

Transformer (deep learning)

Algorithm for modelling sequential data

📊 Rating

3 news mentions · 👍 0 likes · 👎 0 dislikes

📌 Topics

  • Attention Mechanisms (2)
  • Machine Learning Optimization (1)
  • Computational Efficiency (1)
  • Machine Learning (1)
  • Sequence Modeling (1)
  • Computational efficiency (1)
  • Multimodal AI (1)
  • Model optimization (1)

🏷️ Keywords

FlashAttention (2) · Sparse Attention (1) · Early Stopping (1) · Online Permutation (1) · Long-Context Inference (1) · Sequence Length (1) · Computational Efficiency (1) · Llama-3.1-8B (1) · HyperMLP (1) · Self-attention (1) · Sequence modeling (1) · MLP (1) · Transformer architecture (1) · Autoregressive attention (1) · Context history (1) · Hidden representation (1) · Multimodal Large Language Models (1) · Vision token reduction (1) · Attention-driven self-compression (1) · Computational cost (1)

📖 Key Information

In deep learning, the transformer is an artificial neural network architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished. Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM).

📰 Related News (3)

🔗 Entity Intersection Graph

Early stopping(1)Computational resource(1)MLP(1)Transformer (deep learning)

People and organizations frequently mentioned alongside Transformer (deep learning):

🔗 External Links