What I cannot create, I do not understand. --Richard Feynman

🎬 CRPO: Counterfactual Motion Sense for Video Reasoners

A video reasoning model can look accurate on benchmarks yet still rely on static shortcuts. The real test is spatiotemporal sensitivity: if motion direction, temporal order, or event dynamics change, the answer should change for the right reason.

Our method follows a counterfactual RL view: we train on paired original / transformed videos and enforce cross-branch relational consistency with CRR. This makes shortcut policies much harder to optimize and pushes the model toward motion-grounded reasoning.

Why it matters:

More details can be found at: https://ddz16.github.io/crpo.github.io/

When a clip is temporally reversed or motion direction changes, shortcut-heavy models tend to keep the same answer. This exposes weak temporal grounding and motivates explicit counterfactual consistency training.
CRPO uses dual-branch RL over original and counterfactual videos, with CRR constraints that require answers to change for dynamic questions and remain for static ones, improving true spatiotemporal sensitivity.

🤖 QuadGPT: Native Quad Meshes, Autoregressively Built

In production 3D workflows, quality is not only about geometry fidelity, but also about editability, deformation stability, and artist-friendly edge flow. Direct native quad generation matters.

Our method predicts mixed triangle/quad face sequences directly with an autoregressive model, then refines topology quality via tDPO preference optimization rather than fragile post-hoc conversion.

What this unlocks:

More details can be found at: https://hitcslj.github.io/QuadGPT/

See QuadGPT below:

‍

QuadGPT directly generates mixed triangle/quad face sequences in an end-to-end autoregressive manner, replacing triangle-first conversion. The pipeline emphasizes topology-aware tokenization and RL-style refinement (tDPO) for cleaner edge flow and more artist-friendly quad structure.

đź§  VisionCreator: Understand, Plan, and Create

High-quality visual creation is not one-shot generation. It needs a system that can understand intent, reason over constraints, plan multi-step actions, and execute reliably.

Our method treats this as a native UTPC loop (Understanding–Thinking–Planning–Creation), then scales capability with PST and VRL in simulated environments for long-horizon tasks.

Why our method matters:

More details can be found at: https://layjins.github.io/visioncreator/

VisionCreator unifies UTPC (Understanding, Thinking, Planning, Creation) in one native agentic model, trained with structured trajectory data and strengthened by Progressive Specialization Training (PST) plus Virtual Reinforcement Learning (VRL) for long-horizon visual creation tasks.

🕹️ WorldCraft: Object-Level Interaction Beyond Camera Control

Camera navigation alone is not enough for a truly interactive world model. Real interaction is object-centric: click an object, sketch a path, and generate coherent future frames under moving viewpoints.

Our method decomposes this into three key pieces: camera-invariant trajectory representation, non-destructive control injection, and persistent state memory for long autoregressive rollouts.

Why it matters:

You can find WorldCraft at: https://nevsnev.github.io/WorldCraft/

WorldCraft extends interactive world models from camera-only control to object-level trajectory actions. Its core mechanisms include NWT for camera-invariant trajectory representation, SP-LoRA for non-destructive control injection, and TASP for persistent object state after off-camera movement.

🌦️ STCast: Adaptive Global-Regional Forecasting

High-resolution regional weather forecasting is hard to scale if we ignore Earth-wide dependencies. Neighbor-only boundary assumptions often miss long-range interactions, while direct high-resolution global modeling is computationally prohibitive.

Our method, STCast, addresses this with two coordinated components: SAA (Spatial-Aligned Attention) for adaptive global-regional boundary coupling, and TMoE (Temporal Mixture-of-Experts) for month-aware temporal specialization. Together, they form a unified framework evaluated on global forecasting, regional forecasting, extreme event prediction, and ensemble forecasting.‍

Why it matters:

You can find STCast at: https://github.com/chenhao-zju/STCast

Overview of STCast across four weather tasks. The figure contrasts prior regional strategies (neighbor cropping or direct regional training) with STCast’s Earth-aware coupling pipeline, where SAA dynamically aligns global and regional distributions, and TMoE allocates month-conditioned inputs to specialized experts. It also illustrates downstream extensions to typhoon-track-related prediction and probabilistic long-range ensemble rollout.

More about PEI-Lab climate analysis advances are hilighted by:

https://www.linkedin.com/feed/update/urn:li:activity:7462197494102372352/

https://www.linkedin.com/posts/a-breakthrough-in-climate-data-accessibility-share-7462179480783192065-7U5_/

‍