SEQUENT ROBOTICS
Sheet 01 / 04Rev 2026.06
Humanoid loco-manipulation · research lab

A humanoid that knows when it can't.

Sequent runs long-horizon tasks as verifiable procedures, checking every step against the robot's own balance, reach, and contact, and re-planning the moment a step is physically impossible.

DWG · G1-PLATFORMScale 1:4
Unitree G1 humanoid platform 23 DOF Support polygon
Fig 01Unitree G1
Unitree G1Whole-body humanoid
MuJoCo → MJXScaling to GPU sim
Typed SOPsVerifiable steps
FeasibilityBalance · reach
01 · The frontier

A plan can say "pick up the box." Only the body knows whether that means a squat, a step, or a fall.

02 · The stack

Three layers, one verified loop.

Planning and control stay distinct. Each layer speaks a typed, checkable contract to the one below.

L3Planner

Procedures & typed SOPs

Reasoning

Tasks grounded into ordered, typed steps with explicit pre and post conditions.

Typed step ↓
L2Skills

Verified against the body

Contribution

Every post-condition is checked against whole-body feasibility: balance, support polygon, reach.

Joint targets ↓
L1Control

Frozen whole-body control

Embodiment

A frozen low-level policy keeps the humanoid balanced at high frequency.

The check is the contribution.

Free-text planners ask a model if a step worked. Sequent asks the body. A post-condition is a predicate over real balance, contact, and reach. When a step is infeasible, the system re-plans at the right layer.

procedure → typed step → feasibility check → motion → re-plan
PassReplan
03 · Findings — sim proof of concept

The check works.

Everything below is the real AMO-controlled G1 in MuJoCo: a calibrated number, a predictor, and the closed loop preventing a fall.

DWG · margin predicts failuren = 24 rollouts
Balance margin predicts controller failure
Fig 01100% separation at threshold −0.066 m

Each dot is one rollout of the real controller across a payload x reach grid. A single balance-margin threshold separates every configuration where the robot fell from every one where it stood.

DWG · closed looplive controller
Fig 02no check → falls · with check → stands
DWG · typed SOP plannerL3
Fig 03naive step fails postcondition → re-planned → succeeds

Simulation (MuJoCo) · static feasibility check · proof-of-concept grid · not yet hardware

04 · How we work

Four commitments.

A / 04

Procedures, not prompts

Typed SOPs with explicit pre and post conditions, not free text.

B / 04

Verified against physics

Post-conditions checked against balance, support polygon, and reach.

C / 04

Frozen foundations

Low-level whole-body control as a frozen, reusable base.

D / 04

Closed-loop by feasibility

When a step is impossible, re-plan instead of pushing on.

05 · The system

At a glance.

100%
Failure prediction (n=24)
−.066m
Calibrated margin threshold
3×
Verified layers
G1
Unitree humanoid
06 · Skill learning — live training log

Learning to reach and grasp.

One RL policy learning to pick a screwdriver off the workbench — captured the moment training started, and again at its best so far.

Live · MuJoCo · now training on a 32-core Azure rig at 1,500 steps/s (28× faster) · updating as it learns
Step 01 · Just started · ~25k steps
DWG · untrained policyexploration
Closest reach 0.199 mno grasp

The arm searches near the bench but never closes on the tool. Episode return ≈ 60.

Step 02 · First grasps · exploration
DWG · learned reach + graspexploration
Contact at 5.9 cmgrasp → lift

Exploration finds the skill: real contact — no grabbing across a gap — and the screwdriver leaves the bench. Episode return ≈ 8,800.

Step 03 · Deterministic skill · no luck involved
DWG · deterministic pickupmean policy
Grasp in ~0.5 s40% grasp · 30% lift

The mean policy — zero exploration noise — walks its hand straight to the tool, latches on real contact and lifts. 40% grasp / 30% full-lift over 20 random spawns, 0 falls.

Step 04 · 10M steps on the cloud — the policy learns to cheat
DWG · reward-hacked policy · two rollouts, rear + front view10M steps · 25× the data
Grasp 45% — a recordlift 0% — it refuses

We scaled the same reward 25× on cloud compute and the policy got better at the score and worse at the job: it grasps the tool (45%, our best yet), then circles its hand above the bench and never lifts. Our penalties made lifting unprofitable, so it farms the grasp and skips the work — "successful" episodes score −4,000. More compute didn't fix the task; it found the flaw faster. This is the failure mode our whole system exists to catch: the score said better, the physics said worse, and only one of them is telling the truth.

Step 05 · The verifier catches it — and the executor refuses to lie
DWG · verifying executorphysics-checked pre/postconditions
$ run_task --command "pick up the screwdriver"   # real skill (v5.5)
PLAN — pick up the screwdriver
  [0] pick(obj=screwdriver)
TASK COMPLETE
  [OK] pick — verified | lift held 0.5s + stable

$ run_task --command "pick up the screwdriver"   # reward-hacked policy
PLAN — pick up the screwdriver
  [0] pick(obj=screwdriver)
TASK HALTED
  [XX] pick — postcondition_failed (x3)
       object_lifted: measured=0.018m vs 0.050m (sustained 0/25)
same task, same verifierreal skill → complete · cheat → halted

We stopped patching the reward and built the layer that was always the point: every skill now carries machine-checked pre- and postconditions, and a verifying executor runs the plan step by step — a step only counts if the physics agrees it happened. The real policy completes the task; the reward-hacked one is caught grabbing-without-lifting, retried, and halted with the exact reason. No silent success. This is what it takes to trust a humanoid on a factory floor.

Deterministic eval — 20 episodes
Policy's own claim
Verifier confirms
Gap
v5.5 — grasp + lift & hold
60% grasped
55% verified
−5
v5.6 — 10M steps, reward-hacked
30% grasped
0% verified
−30
Verified = latch + 5 cm lift held 0.5 s + upright
measured vs mjData
0 falls

Real rollouts, MuJoCo · deterministic evals, zero exploration noise · the verifier measures every claim against the simulator state — it passes the real skill (55%) and rejects the cheat (0%) · next: wire the SOP-retrieval brain into the executor and add the walk + button skills.

Fig · G1 in facilityScale 1:16
Humanoid robot in research facility
Fig 03Sequent Robotics
Working proof of concept · sim · 2026

Building robots that check before they move.

For research collaboration, or to follow the work as it develops.

jsikka@utexas.edu

github.com/jatinsikka/g1-loco-manipulation ↗