Reborn Embodied Vlog

Robots learn embodied manipulation from human handwork.

Reborn Embodied Vlog

"Reborn Embodied Vlog" (REV) is designed as a mobile app that enables users to record and upload first-person perspective videos of fine manipulation tasks from their daily lives. The process is simple:

Recording: Users use their smartphones, GoPro, or other cameras to capture videos of themselves performing detailed tasks, such as preparing food, cleaning, assembling objects, or other hand-manipulation actions.
Uploading: The app allows users to upload these videos directly to the platform, where the footage is anonymized and processed.
Video Analysis: The system processes the video data to extract hand movement, gestures, object interactions, and fine manipulation patterns.
Global Contribution: By contributing their data, users help build a massive, diverse dataset that is accessible to AI systems worldwide, improving the accuracy and efficiency of robot manipulation tasks.

Transfer Reborn Embodied Vlog to Embodied Training Data

To process first-person perspective video data into hand landmarks for robotic dexterous hand training, a structured pipeline can be followed:

1. Video Preprocessing

Frame Extraction: Convert video into a sequence of frames to analyze each individually. Select a suitable frame rate (e.g., 30 FPS) to balance detail and computational cost.
Image Enhancement: Improve video quality (e.g., brightness, contrast) to ensure clear visualization of hands and objects in various lighting conditions.
Segmentation: Use a hand segmentation algorithm to isolate the hand region from the background, reducing noise and focusing the analysis.

2. Hand Landmark Detection

Pose Estimation Models: Utilize state-of-the-art hand pose estimation models, such as Mediapipe Hand Tracking or DeepHand, to detect keypoints on the hand.
- Detect key landmarks such as fingertips, knuckles, wrist, and palm center.
- Use 2D/3D coordinate extraction to map hand keypoints relative to the frame or environment.
3D Reconstruction (if needed): Use stereo cameras or infer depth from monocular video using advanced models like DensePose or MANO (Model-based Articulated Hand Object).

PreviousReboverse Simulation Engine NextOpen Model Ecosystem