Simplicity is the ultimate sophistication --Leonardo da Vinci

๐Ÿ”ฎ Future Receptive: World Model

A world model gives a robot an internal simulator. Show it the current view and the action you want to take, and it generates a realistic video of what would happen next โ€” letting the robot "imagine" outcomes before moving a single motor in the real world.

Our model, IRASim, makes every generated frame line up precisely with the action behind it. The result is video that captures the subtle stuff robots actually struggle with โ€” a bowl slipping from the gripper, a block being nudged, a drawer sliding shut โ€” in high resolution and over long horizons.

Why it matters:

More details can be found at: https://gen-irasim.github.io/

See the demos below ๐Ÿ‘‡

Short Trajectory
RT-1
World Model Prediction
Short Trajectory
RT-1
Ground Truth
Long Trajectory
RT-1
World Model Prediction
Long Trajectory
RT-1
Ground Truth
Short Trajectory
Bridge
World Model Prediction
Short Trajectory
Bridge
Ground Truth
Long Trajectory
Bridge
World Model Prediction
Long Trajectory
Bridge
Ground Truth
Short Trajectory
Language-Table
World Model Prediction
Short Trajectory
Language-Table
Ground Truth
Long Trajectory
Language-Table
World Model Prediction
Long Trajectory
Language-Table
Ground Truth

๐Ÿค– From Imagination to Skill: Learning Inside the World Model

If a world model can faithfully imagine what happens next, then a robot can do more than just watch โ€” it can practice inside its own head.

That's the idea behind our follow-up work, WMPO (World-Model-based Policy Optimization). Instead of sending the robot back into the lab to collect thousands of new trials, we let it train entirely inside the imagined world. The robot tries, fails, tries again โ€” all in pixels generated by our world model โ€” and gradually becomes better at the real task.

What this unlocks:

More details can be found at: https://wm-po.github.io/

Three different VLA training paradigms: (a) Imitation learning learns from human demonstrations but lacksthe ability for learning from failures and self-correction; (b) Real-world RL improves policy through direct interactionbut suffers from high sampling costs and difficulty in achieving on-policy RL; (c) WMPO pretrains a world model onlarge-scale robotic trajectories and fine-tunes it with limited policy behavior data, enabling sample-efficient on-policyRL for VLA without real-world interaction.
Behavior analysis of the Square task (inserting the square into the stick) shows that, compared with the basepolicy, WMPO demonstrates the ability to self-correct.

๐Ÿง  HALO: Thinking, Imagining, and Acting โ€” All in One Mind

What if the robot didn't need a separate world model running alongside it โ€” what if imagining the future became part of how it thinks?

That's the leap behind HALO. Given a goal like "arrange the blocks in red-green-blue order," it works the way people do:

Why it matters:

Our code is released at: https://github.com/qshou-coder/HALO

See HALO in action below ๐Ÿ‘‡

The pipeline converts raw robotic trajectories into EM-CoT data in threephases: (1) action primitives are extracted from robot proprioception via rule-based matching; (2) a VLM acts as an annotator to generatetask plans, decompose trajectories into subtasks, and align each subtask with explicit textual reasoning; and (3) the terminal frame of eachsubtask is selected as a visual subgoal image, producing structured embodied multimodal chain-of-thought supervision.

๐Ÿชž RedFlow: Learning From Mistakes, One Move at a Time

Robots fail a lot. Most methods treat a failed attempt as one big "bad example" โ€” but inside a failed try, most of the actions were fine. The mistake is usually one specific moment that sent everything off course.

RedFlow zooms in on those moments. For each failure, it pinpoints the exact step that went wrong, finds a similar moment from a successful attempt, and uses it as a concrete example of what should have happened. Instead of just "don't do that," it shows the robot "do this instead."

Why it matters:

See RedFlow recover from its own mistakes below ๐Ÿ‘‡

Qualitative correction behavior on clothes folding. The base policy fails when the T-shirt falls out of the right armโ€™s reach, while RedFlow recovers by using the left arm to pull the cloth backand continue folding.

๐Ÿ› ๏ธ RTC-Anything: Bringing It All to a Real Robot

Research is one thing. Getting it to actually run on a robot in front of you โ€” synchronized cameras, smooth motion, safe execution โ€” is another.

RTC-Anything is the deployment framework we built to close that gap. It takes any of the policies above and runs them on real Agilex robots, with real-time action chunking that keeps motion fluid instead of jerky. The framework is model-agnostic (swap in your own policy backend) and task-agnostic (clothes folding, sweeping, cleaning โ€” all share the same runtime).

What's included:

You can find RTC-Anything at: https://github.com/PEILAB-PhysAI/RTC-Anything

Everything is open-source โ€” grab the code, plug in your robot, and start deploying ๐Ÿ‘‡