HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

HEX is a whole-body vision-language-action framework for full-sized humanoid robots. It combines a Qwen-VL backbone, a Unified Proprioceptive Predictor (UPP), and a flow-matching action head to predict continuous future actions. The key idea of HEX is to align heterogeneous humanoid states into shared body-part slots and learn predictive body dynamics from cross-embodiment humanoid data. This enables the policy to transfer across different humanoid platforms and perform long-horizon whole-body manipulation. During deployment, HEX directly predicts arm, hand, and waist actions, while providing high-level commands to a low-level RL-based whole-body controller for generating leg actions. This design enables coordinated and stable humanoid manipulation.

News

✅ 2026/05/17: Pretraining and fine-tuning code of VLA has been released.

Installation

First, git clone this repo and cd into it.

# clone project
git clone https://github.com/Cognition2ActionLab/HEX.git
cd HEX

Then create python/pytorch env.

# crerate conda environment
conda create -n hex python=3.10 -y
conda activate hex

# Install env dependencies
sudo apt update
sudo apt install libegl1-mesa-dev libglu1-mesa

# Install requirements
pip install -r requirements.txt

# Install FlashAttention2
pip install flash-attn --no-build-isolation

# Install HEX
pip install -e .

If flash-attn fails to install correctly, you can run

python hex/utils/test_flash_attn.py

to check the versions of PyTorch, CUDA, and the libstdc++ ABI. Then, manually download a compatible wheel from the flash-attn release. We use version 2.7.3. However, for newer GPUs (e.g., NVIDIA RTX 5090), you should install the latest available release (e.g., version 2.8.3) to ensure compatibility. Example:

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.3+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

Quick Start

We release the pretrained HEX checkpoint on Hugging Face.

Description	Params	Link
HEX	2.4B	🤗 HEX-model

Download HEX Checkpoints

To download the HEX checkpoint, first modify the target download path in hex/utils/download_model_hex.py, and then run:

python hex/utils/download_model_hex.py

Download the Base VLM

Before running inference, please also download the Qwen3-VL base model:

python hex/utils/download_model_qwen.py

After downloading Qwen3-VL, update the framework.qwenvl.base_vlm field in the config.yaml file of the downloaded HEX checkpoint to your local Qwen3-VL path.

Run Inference

Once both the HEX checkpoint and the Qwen3-VL model are prepared, follow notebooks/eval_model.ipynb to run model inference.

Data

Data Source

We open-source the 8 real-world evaluation task datasets collected in HEX, which can be directly used for fine-tuning. The full training data used in this project consists of the following sources:

Embodiment / Platform	Source	Dataset
Tienkung Series	HEX	🤗 HF Link
Unitree G1	Humanoid Everyday	🤗 HF Link
AgiBot-to-Unitree G1	AgiBot World Colosseo & TrajBooster	🤗 HF Link
Unitree H1	Humanoid Everyday	🤗 HF Link
Leju Kuavo	RoboCOIN	🤗 HF Link

To download all datasets, run:

bash scripts/download_datasets.sh

Since HEX still follows the LeRobot v2.1 data format, each dataset should contain a corresponding modality.json.
For each Leju Kuavo dataset, please copy examples/real_world/modality_leju/modality.json to <leju_dataset>/meta/modality.json.

The overall data structure is as follows:

eai_real_world/
├── dvt217_carry_boxes_and_avoid_obstacles_260113_lerobot
├── ...
├── evt12_carry_box_and_tidy_table_260318_lerobot
├── ...
├── g1_add_the_seasoning_to_the_pot
├── ...
├── g1_humanoid_everyday
├── h1_humanoid_everyday
├── leju_robot_box_storage_parcel
└── ...

Data Collection

Due to commercial restrictions, we are unable to release the data collection pipeline used for the Tienkung series robots.

For users interested in collecting data on Unitree G1, we recommend referring to the following open-source data collection pipelines:

OpenTrajBooster, which uses a VR headset and handheld joysticks for full-body teleoperation.
Psi0: uses a PICO VR headset with controllers, along with a waist tracker and foot trackers for full-body teleoperation.

Pretraining

You can download our pretrained HEX model and skip this step if you only want to run inference or evaluation.

Before pretraining, please download the Qwen3-VL backbone:

bash scripts/download_models.sh

Then, update the dataset paths in the following files to match your local directory structure:

hex/dataloader/gr00t_lerobot/mixtures.py, Line 9
hex/dataloader/gr00t_lerobot/data_config.py, Line 1299

Next, modify the following fields in scripts/pretrain_hex.sh:

base_vlm: path to your downloaded Qwen3-VL model
data_root_dir: path to your local dataset directory
dataset_name: the dataset mixture name, which should be consistent with the settings in hex/dataloader/gr00t_lerobot/mixtures.py

Finally, start pretraining with:

bash scripts/pretrain_hex.sh

Fine-tuning

After obtaining the pretrained HEX model, you can further fine-tune HEX on downstream datasets.

Before fine-tuning, please modify the following fields in scripts/fine_tune_hex.sh:

base_vlm: path to your Qwen3-VL backbone
data_root_dir: path to your local dataset directory
dataset_name: name of the downstream dataset mixture, which should be consistent with the settings in hex/dataloader/gr00t_lerobot/mixtures.py
pretrained_models_path: path to the pretrained HEX checkpoint

Then, start fine-tuning with:

bash scripts/fine_tune_hex.sh

Depolyment

Due to commercial restrictions, the low-level RL-based whole-body controller used for the Tienkung series robots is not open-sourced. However, we provide a sample deployment interface in examples/real_world.

If you want to deploy your own model on Unitree G1, you may refer to the following open-source projects:

OpenTrajBooster: uses HOMIE as the low-level RL-based whole-body controller.
Psi0: uses AMO as the low-level RL-based whole-body controller.

When training your own low-level controller, please make sure that the command space output by the high-level VLA policy matches the input space expected by the low-level controller. The dataset construction process should also follow the same interface for consistent training and deployment.

Simulation

Thanks to the cross-embodiment capability of VLA models, HEX can also be evaluated in simulation environments such as LIBERO.

First, download the LIBERO datasets:

python hex/utils/download_dataset_libero.py --base_dir /your/dataset/path

Then, replace the modality.json file for each LIBERO suite with the provided template in examples/LIBERO/modality.json.

Next, modify the following fields in scripts/libero/train_hex_libero.sh:

base_vlm: path to your Qwen3-VL backbone
dataset_name: name of the LIBERO dataset mixture
data_root_dir: path to your local LIBERO dataset directory

Then start training with:

bash scripts/libero/train_hex_libero.sh

For evaluation, modify the following fields in scripts/libero/eval_libero.sh:

ckpt_root: root directory of the trained checkpoint
ckpt_path: relative path to the checkpoint file

Then run:

bash scripts/libero/eval_libero.sh

Citation

@article{bai2026hex,
  title={HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation},
  author={Bai, Shuanghao and Li, Meng and Lv, Xinyuan and Wang, Jiawei and Wang, Xinhua and Liao, Fei and Hou, Chengkai and Gu, Langzhe and Zhou, Wanqi and Wu, Kun and others},
  journal={arXiv preprint arXiv:2604.07993},
  year={2026}
}

Ackwnledgemments

This project draws inspiration from and builds upon several notable open-source projects, including: StarVLA, Isaac-GR00T, HiMoE-VLA, LeRobot, Humanoid Everyday, RoboCOIN, AgiBot-World, and OpenTrajBooster.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
deployment		deployment
examples		examples
hex		hex
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

News

Installation

Quick Start

Download HEX Checkpoints

Download the Base VLM

Run Inference

Data

Data Source

Data Collection

Pretraining

Fine-tuning

Depolyment

Simulation

Citation

Ackwnledgemments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

News

Installation

Quick Start

Download HEX Checkpoints

Download the Base VLM

Run Inference

Data

Data Source

Data Collection

Pretraining

Fine-tuning

Depolyment

Simulation

Citation

Ackwnledgemments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages