SP
BravenNow
Learning Native Continuation for Action Chunking Flow Policies
| USA | technology | ✓ Verified - arxiv.org

Learning Native Continuation for Action Chunking Flow Policies

#Vision Language Action #action chunking #Legato #real-time AI #smooth trajectories #VLA models #continuation method #multimodal switching

📌 Key Takeaways

  • Researchers developed 'Legato' method for smoother VLA model execution
  • Action chunking enables real-time VLA models but creates discontinuities
  • Existing solutions like RTC work externally to the policy, causing multimodal switching
  • Legato embeds continuation directly into training for intrinsically smooth trajectories

📖 Full Retelling

Researchers have developed a new method called 'Legato' to improve Vision Language Action (VLA) models, addressing discontinuities in real-time action execution as announced in their paper on February 13, 2026. The research team, whose names are not specified in the abstract, identified that while action chunking enables VLA models to operate in real-time, the naive approach often creates abrupt transitions at chunk boundaries, resulting in unnatural movements. The paper introduces Legato as a training-time continuation method specifically designed for action-chunked flow-based VLA policies to create smoother, more natural trajectories. The researchers highlight that existing solutions like Real-Time Chunking (RTC) alleviate discontinuity issues but operate externally to the policy, causing problematic multimodal switching and intrinsically non-smooth paths. By embedding continuation directly into the training process, Legato represents a fundamental advancement in how AI systems execute continuous physical or digital actions.

🏷️ Themes

Artificial Intelligence, Computer Vision, Robotics

📚 Related People & Topics

Legato

Indicates that musical notes are played or sung smoothly and connected

In music performance and notation, legato ([leˈɡaːto]; Italian for "tied together"; French lié; German gebunden) indicates that musical notes are played or sung smoothly, such that the transition from note to note is made with no intervening silence. Legato technique is required for slurred performa...

View Profile → Wikipedia ↗

Entity Intersection Graph

No entity connections available yet for this article.

Original Source
arXiv:2602.12978v1 Announce Type: cross Abstract: Action chunking enables Vision Language Action (VLA) models to run in real time, but naive chunked execution often exhibits discontinuities at chunk boundaries. Real-Time Chunking (RTC) alleviates this issue but is external to the policy, leading to spurious multimodal switching and trajectories that are not intrinsically smooth. We propose Legato, a training-time continuation method for action-chunked flow-based VLA policies. Specifically, Lega
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine