Reborn Embodied Vlog

Robots learn embodied manipulation from human handwork.

Reborn Embodied Vlog

"Reborn Embodied Vlog" (REV) is designed as a mobile app that enables users to record and upload first-person perspective videos of fine manipulation tasks from their daily lives. The process is simple:

Recording: Users use their smartphones, GoPro, or other cameras to capture videos of themselves performing detailed tasks, such as preparing food, cleaning, assembling objects, or other hand-manipulation actions.
Uploading: The app allows users to upload these videos directly to the platform, where the footage is anonymized and processed.
Video Analysis: The system processes the video data to extract hand movement, gestures, object interactions, and fine manipulation patterns.
Global Contribution: By contributing their data, users help build a massive, diverse dataset that is accessible to AI systems worldwide, improving the accuracy and efficiency of robot manipulation tasks.

Transfer Reborn Embodied Vlog to Embodied Training Data

To process first-person perspective video data into hand landmarks for robotic dexterous hand training, a structured pipeline can be followed:

1. Video Preprocessing

Frame Extraction: Convert video into a sequence of frames to analyze each individually. Select a suitable frame rate (e.g., 30 FPS) to balance detail and computational cost.
Image Enhancement: Improve video quality (e.g., brightness, contrast) to ensure clear visualization of hands and objects in various lighting conditions.
Segmentation: Use a hand segmentation algorithm to isolate the hand region from the background, reducing noise and focusing the analysis.

2. Hand Landmark Detection

Pose Estimation Models: Utilize state-of-the-art hand pose estimation models, such as Mediapipe Hand Tracking or DeepHand, to detect keypoints on the hand.
- Detect key landmarks such as fingertips, knuckles, wrist, and palm center.
- Use 2D/3D coordinate extraction to map hand keypoints relative to the frame or environment.
3D Reconstruction (if needed): Use stereo cameras or infer depth from monocular video using advanced models like DensePose or MANO (Model-based Articulated Hand Object).

PreviousReboverse Simulation Engine NextOpen Model Ecosystem

Last updated 9 months ago

hashtagReborn Embodied Vlog

hashtagTransfer Reborn Embodied Vlog to Embodied Training Data

hashtag1. Video Preprocessing

hashtag2. Hand Landmark Detection

Reborn Embodied Vlog

Transfer Reborn Embodied Vlog to Embodied Training Data

1. Video Preprocessing

2. Hand Landmark Detection