Hongliang Zeng (Xavier)

AI & Embodied Intelligence Researcher

Download CV
Hongliang Zeng

About

I received my Ph.D. from the School of Computer Science and Engineering at South China University of Technology (SCUT), advised by Prof. Ping Zhang. My research focuses on embodied intelligence and computer vision, spanning robotic manipulation, active perception, reinforcement learning, and 3D point cloud understanding. I have published 7 first-author papers at venues including IJCAI, AAAI, TNNLS, ICASSP, and ICME, with 4 Oral presentations. I currently work as an Embodied Intelligence Algorithm Engineer at Astribot, focusing on VLA, reinforcement learning, and world models for robotic manipulation.

Education

Sep 2020 — Jun 2025
  • Research on embodied intelligence, including robotic manipulation, active perception, and reinforcement learning, published at IJCAI, AAAI, TNNLS, etc.
  • Research on 3D point cloud self-supervised learning and generation, published at ICASSP, ICME, etc.
  • Granted patent: A Method and System for Robotic Manipulation of Articulated Objects.
Robotic Manipulation Computer Vision
Sep 2014 — Jun 2018

Undergraduate studies in mechanical engineering.

CAD SolidWorks

Work Experience

Nov 2024 — Present

Embodied Intelligence Algorithm Engineer

Astribot, Shenzhen

Algorithm

  • Unified the value model and VLA into a single training pipeline and implemented RL with real-robot recap rollouts, enabling the Astribot S1 to accomplish deformable-object manipulation (e.g., cloth folding) for the first time; delivered an internal demo.
  • Co-first author and core developer of DuoCore-FS, an asynchronous fast-slow dual-system VLA for whole-body manipulation. Contributed to architecture design, real-robot deployment, and paper writing. Achieved 30 Hz action generation (3× faster than comparable VLAs).
  • Implemented Real-Time Chunking (RTC) during training, achieving smoother action-chunk transitions, significantly reducing post-hoc trajectory smoothing on the control side, and improving action tracking precision.
  • Reproduced and integrated multiple mainstream VLA models (π0, π0.5, VLA-Adapter, RDT-2, WALL-X, etc.). Built two in-house VLAs by adding flow-matching action heads to Qwen3VL and Rynnec, matching π0-level performance at comparable data scale with flexible input resolution.
  • Explored training paradigms on 4,000+ hours of heterogeneous robot data: pretraining followed by fine-tuning (retaining 20% pretrain data) achieved 60% higher success rate and 30% faster convergence vs. training from scratch. Adopted dual-arm relative coordinate frame for ±10 cm height generalization. Fine-tuned π0.5 on the full dataset as the team's shared training checkpoint.
  • Proposed VLA + object detection co-training, using detection as an auxiliary task to improve generalization on unseen objects (success rate +40%); validated in an internal 711 retail scenario.

AI Infra

  • Independently built the team's unified robot-model distributed training framework on DeepSpeed ZeRO with multi-node multi-GPU support, integrating bf16 mixed precision, gradient checkpointing, and FlashAttention 2. Supports LeRobot data format, LoRA fine-tuning, and WebSocket inference serving, significantly accelerating model iteration and enabling rapid delivery of multiple POC projects (Jinma, etc.).
  • Defined a unified dataset format standard for the company's robot data. Optimized the LeRobot data pipeline — normalization computation (30× speedup) and data loading/conversion — significantly improving training data preparation efficiency.
  • Curated, cleaned, and aligned 4,000+ hours of heterogeneous robot data across action coordinate systems for large-scale pretraining.
VLA RL Flow Matching Co-training RTC
Jul 2018 — Jun 2019

Mechanical design and engineering.

SolidWorks AutoCAD Mechanical Design

Projects

MARS

Multimodal Active Robotic Sensing for Articulated Characterization. A framework that enables robots to actively perceive and characterize articulated objects through multimodal sensing.

Python PyTorch IJCAI 2024

Point-UMAE

Unet-like Masked Autoencoders for Point Cloud Self-supervised Learning. A multi-scale framework with top-down masking strategy for 3D shape classification, part segmentation, and object detection.

Python PyTorch ICASSP 2025

Publications