Hongliang Zeng (Xavier)

Education

Sep 2020 — Jun 2025

Ph.D. in Computer Science (Combined M.S./Ph.D.)

South China University of Technology, School of Computer Science and Engineering

Research on embodied intelligence, including robotic manipulation, active perception, and reinforcement learning, published at IJCAI, AAAI, TNNLS, etc.
Research on 3D point cloud self-supervised learning and generation, published at ICASSP, ICME, etc.
Granted patent: A Method and System for Robotic Manipulation of Articulated Objects.

Robotic Manipulation Computer Vision

Sep 2014 — Jun 2018

B.E. in Mechanical Engineering

South China University of Technology, School of Mechanical and Automotive Engineering

Undergraduate studies in mechanical engineering.

SolidWorks AutoCAD

Work Experience

Nov 2024 — Present

Embodied Intelligence Algorithm Engineer

Astribot, Shenzhen

Algorithm

Unified the value model and VLA into a single training pipeline and implemented RL with real-robot recap rollouts, enabling the Astribot S1 to accomplish deformable-object manipulation (e.g., cloth folding) for the first time; delivered an internal demo. Currently driving the end-to-end integration of the world model (ctrl_world) with this RL pipeline, exploring a new technical route where synthetic data generated by the world model drives policy optimization.
Co-first author and core developer of DuoCore-FS, an asynchronous fast-slow dual-system VLA for whole-body manipulation. Contributed to architecture design, real-robot deployment, and paper writing. Achieved 30 Hz action generation (3× faster than comparable VLAs).
Implemented Real-Time Chunking (RTC) during training, achieving smoother action-chunk transitions, significantly reducing post-hoc trajectory smoothing on the control side, and improving action tracking precision.
Reproduced and integrated multiple mainstream VLA models (π0, π0.5, π0.6*, RDT-2, WALL-X, MEM, VLA-Adapter, etc.), WM (ctrl_world), and WAM (lingbot-va). Built two in-house VLAs by adding flow-matching action heads to Qwen3VL and Rynnec, matching π0-level performance at comparable data scale with flexible input resolution.
Explored pretraining-then-finetuning paradigms on 4,000+ hours of heterogeneous robot data, boosting success rate by 60% and convergence by 30%. Adopted dual-arm relative coordinates for ±10 cm height generalization. Fine-tuned π0.5 on the full dataset as the team's shared checkpoint.
Proposed VLA + object detection co-training, using detection as an auxiliary task to improve generalization on unseen objects (success rate +40%); validated in an internal 711 retail scenario.
Participated in model training and testing for the Astribot 0426 DuoCore system live demo, achieving the first successful deployment of VLA on the S0 embodiment.

AI Infra

Independently built the team's unified robot-model distributed training framework on DeepSpeed ZeRO with multi-node multi-GPU support, integrating bf16 mixed precision, gradient checkpointing, and FlashAttention 2. Supports LeRobot data format, LoRA fine-tuning, and WebSocket inference serving, significantly accelerating model iteration and enabling rapid delivery of multiple POC projects (Jinma, etc.).
Defined a unified dataset format standard for the company's robot data. Optimized the LeRobot data pipeline — normalization computation (30× speedup) and data loading/conversion — significantly improving training data preparation efficiency.
Curated, cleaned, and standardized 4,000+ hours of heterogeneous robot data, unifying action coordinate frames across diverse embodiments.

Honors

5-Star Performance (Top Rating)
Outstanding Contribution Partner
Astribot Navigator

VLA RL World Model DeepSpeed Vibe Coding

Jul 2018 — Jun 2019

Mechanical Engineer

Guangdong HYNN Technology Co., Ltd.

Mechanical design and engineering.

SolidWorks AutoCAD Mechanical Design

Projects

GMAP

Generalized Manipulation of Articulated Objects Using Pre-trained Model. Leverages pre-trained visual models for generalizable robotic manipulation of articulated objects.

Python PyTorch AAAI 2025

GitHub Paper

MARS

Multimodal Active Robotic Sensing for Articulated Characterization. A framework that enables robots to actively perceive and characterize articulated objects through multimodal sensing.

Python PyTorch IJCAI 2024

GitHub Paper

Point-MGE

Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds. A joint framework combining point cloud representation learning with 3D shape generation.

Python PyTorch ICME 2025

GitHub Paper

Point-UMAE

Unet-like Masked Autoencoders for Point Cloud Self-supervised Learning. A multi-scale framework with top-down masking strategy for 3D shape classification, part segmentation, and object detection.

Python PyTorch ICASSP 2025

GitHub Paper

Publications

Asynchronous Fast-Slow Vision-Language-Action Policies for Whole-Body Robotic Manipulation

Teqiang Zou*, Hongliang Zeng*, Yuxuan Nong, et al.

arXiv preprint, 2025

ArXiv PDF
Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds

Hongliang Zeng, Ping Zhang, Fang Li, et al.

IEEE International Conference on Multimedia and Expo (ICME), 2025 Oral

ArXiv PDF
GMAP: Generalized Manipulation of Articulated Objects in Robotic Using Pre-trained Model

Hongliang Zeng, Ping Zhang, Fang Li, et al.

AAAI Conference on Artificial Intelligence (AAAI), 2025 Oral

DOI PDF Code
MARS: Multimodal Active Robotic Sensing for Articulated Characterization

Hongliang Zeng, Ping Zhang, Chengjiong Wu, et al.

International Joint Conference on Artificial Intelligence (IJCAI), 2024 Oral

ArXiv PDF Code
AHEGC: Adaptive Hindsight Experience Replay With Goal-Amended Curiosity Module for Robot Control

Hongliang Zeng, Ping Zhang, Fang Li, et al.

IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024 JCR Q1, IF: 10.4

DOI PDF
Point-UMAE: Unet-like Masked Autoencoders for Point Cloud Self-supervised Learning

Hongliang Zeng, Ping Zhang, Fang Li, et al.

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025 Oral

DOI PDF
Active Visual Learning for Robots with Dueling Deep Q-Networks and Transformer Encoders

Hongliang Zeng, Ping Zhang, Fang Li, et al.

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025

DOI PDF
FAEMTrack: Feature-Augmented Embedding and Cross-Drone Fusion for Single Object Tracking

Jiahua Wang, Ping Zhang, Hongliang Zeng, et al.

IEEE Internet of Things Journal, 2025
Multi-objective Neural Architecture Search Combining Binary Artificial Bee Colony Algorithm for Dynamic Hand Gesture Recognition

Tingyu Ye, Ping Zhang, Hongliang Zeng, et al.

Expert Systems with Applications, 2025
Reinforcement Learning-Driven Dual Neighborhood Structure Artificial Bee Colony Algorithm for Continuous Optimization Problem

Tingyu Ye, Ping Zhang, Hui Wang, Hongliang Zeng, et al.

Applied Soft Computing, 2024