Hongliang Zeng (Xavier) — AI & Embodied Intelligence

About

I received my Ph.D. from the School of Computer Science and Engineering at South China University of Technology (SCUT), advised by Prof. Ping Zhang. My research focuses on embodied intelligence and computer vision, spanning robotic manipulation, active perception, reinforcement learning, and 3D point cloud understanding. I have published 7 first-author papers at venues including IJCAI, AAAI, TNNLS, ICASSP, and ICME, with 4 Oral presentations. I currently work as an Embodied Intelligence Algorithm Engineer at Astribot, focusing on VLA, reinforcement learning, and world models for robotic manipulation.

Education

Sep 2020 — Jun 2025

Ph.D. in Computer Science (Combined M.S./Ph.D.)

South China University of Technology, School of Computer Science and Engineering

Research on embodied intelligence, including robotic manipulation, active perception, and reinforcement learning, published at IJCAI, AAAI, TNNLS, etc.
Research on 3D point cloud self-supervised learning and generation, published at ICASSP, ICME, etc.
Granted patent: A Method and System for Robotic Manipulation of Articulated Objects.

Robotic Manipulation Computer Vision

Sep 2014 — Jun 2018

B.E. in Mechanical Engineering

South China University of Technology, School of Mechanical and Automotive Engineering

Undergraduate studies in mechanical engineering.

CAD SolidWorks

Work Experience

Nov 2024 — Present

Embodied Intelligence Algorithm Engineer

Astribot, Shenzhen

Algorithm

Unified the value model and VLA into a single training pipeline and implemented RL with real-robot recap rollouts, enabling the Astribot S1 to accomplish deformable-object manipulation (e.g., cloth folding) for the first time; delivered an internal demo.
Co-first author and core developer of DuoCore-FS, an asynchronous fast-slow dual-system VLA for whole-body manipulation. Contributed to architecture design, real-robot deployment, and paper writing. Achieved 30 Hz action generation (3× faster than comparable VLAs).
Implemented Real-Time Chunking (RTC) during training, achieving smoother action-chunk transitions, significantly reducing post-hoc trajectory smoothing on the control side, and improving action tracking precision.
Reproduced and integrated multiple mainstream VLA models (π0, π0.5, VLA-Adapter, RDT-2, WALL-X, etc.). Built two in-house VLAs by adding flow-matching action heads to Qwen3VL and Rynnec, matching π0-level performance at comparable data scale with flexible input resolution.
Explored training paradigms on 4,000+ hours of heterogeneous robot data: pretraining followed by fine-tuning (retaining 20% pretrain data) achieved 60% higher success rate and 30% faster convergence vs. training from scratch. Adopted dual-arm relative coordinate frame for ±10 cm height generalization. Fine-tuned π0.5 on the full dataset as the team's shared training checkpoint.
Proposed VLA + object detection co-training, using detection as an auxiliary task to improve generalization on unseen objects (success rate +40%); validated in an internal 711 retail scenario.

AI Infra

Independently built the team's unified robot-model distributed training framework on DeepSpeed ZeRO with multi-node multi-GPU support, integrating bf16 mixed precision, gradient checkpointing, and FlashAttention 2. Supports LeRobot data format, LoRA fine-tuning, and WebSocket inference serving, significantly accelerating model iteration and enabling rapid delivery of multiple POC projects (Jinma, etc.).
Defined a unified dataset format standard for the company's robot data. Optimized the LeRobot data pipeline — normalization computation (30× speedup) and data loading/conversion — significantly improving training data preparation efficiency.
Curated, cleaned, and aligned 4,000+ hours of heterogeneous robot data across action coordinate systems for large-scale pretraining.

VLA RL Flow Matching Co-training RTC

Jul 2018 — Jun 2019

Mechanical Engineer

Guangdong HYNN Technology Co., Ltd.

Mechanical design and engineering.

SolidWorks AutoCAD Mechanical Design

Projects

MARS

Multimodal Active Robotic Sensing for Articulated Characterization. A framework that enables robots to actively perceive and characterize articulated objects through multimodal sensing.

Python PyTorch IJCAI 2024

GitHub Paper

Point-UMAE

Unet-like Masked Autoencoders for Point Cloud Self-supervised Learning. A multi-scale framework with top-down masking strategy for 3D shape classification, part segmentation, and object detection.

Python PyTorch ICASSP 2025

GitHub Paper

Publications

Asynchronous Fast-Slow Vision-Language-Action Policies for Whole-Body Robotic Manipulation

Teqiang Zou*, Hongliang Zeng*, Yuxuan Nong, et al.

arXiv preprint, 2025

ArXiv PDF
Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds

Hongliang Zeng, Ping Zhang, Fang Li, et al.

IEEE International Conference on Multimedia and Expo (ICME), 2025 Oral

ArXiv PDF
GMAP: Generalized Manipulation of Articulated Objects in Robotic Using Pre-trained Model

Hongliang Zeng, Ping Zhang, Fang Li, et al.

AAAI Conference on Artificial Intelligence (AAAI), 2025 Oral

DOI PDF Code
MARS: Multimodal Active Robotic Sensing for Articulated Characterization

Hongliang Zeng, Ping Zhang, Chengjiong Wu, et al.

International Joint Conference on Artificial Intelligence (IJCAI), 2024 Oral

ArXiv PDF Code
AHEGC: Adaptive Hindsight Experience Replay With Goal-Amended Curiosity Module for Robot Control

Hongliang Zeng, Ping Zhang, Fang Li, et al.

IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024 JCR Q1, IF: 10.4

DOI PDF
Point-UMAE: Unet-like Masked Autoencoders for Point Cloud Self-supervised Learning

Hongliang Zeng, Ping Zhang, Fang Li, et al.

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025 Oral

DOI PDF
Active Visual Learning for Robots with Dueling Deep Q-Networks and Transformer Encoders

Hongliang Zeng, Ping Zhang, Fang Li, et al.

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025

DOI PDF
FAEMTrack: Feature-Augmented Embedding and Cross-Drone Fusion for Single Object Tracking

Jiahua Wang, Ping Zhang, Hongliang Zeng, et al.

IEEE Internet of Things Journal, 2025
Multi-objective Neural Architecture Search Combining Binary Artificial Bee Colony Algorithm for Dynamic Hand Gesture Recognition

Tingyu Ye, Ping Zhang, Hongliang Zeng, et al.

Expert Systems with Applications, 2025
Reinforcement Learning-Driven Dual Neighborhood Structure Artificial Bee Colony Algorithm for Continuous Optimization Problem

Tingyu Ye, Ping Zhang, Hui Wang, Hongliang Zeng, et al.

Applied Soft Computing, 2024