# Reborn Embodied Vlog

### **Reborn Embodied Vlog**

"Reborn Embodied Vlog" (REV) is designed as a mobile app that enables users to record and upload first-person perspective videos of fine manipulation tasks from their daily lives. The process is simple:

* **Recording:** Users use their smartphones, GoPro, or other cameras to capture videos of themselves performing detailed tasks, such as preparing food, cleaning, assembling objects, or other hand-manipulation actions.
* **Uploading:** The app allows users to upload these videos directly to the platform, where the footage is anonymized and processed.
* **Video Analysis:** The system processes the video data to extract hand movement, gestures, object interactions, and fine manipulation patterns.
* **Global Contribution:** By contributing their data, users help build a massive, diverse dataset that is accessible to AI systems worldwide, improving the accuracy and efficiency of robot manipulation tasks.

<figure><img src="https://1432733124-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FgH94UrDobd4o90tPX2EU%2Fuploads%2FXCo3pOyUwSkJW3tx1xGo%2F3.png?alt=media&#x26;token=95145fdc-7894-4789-bf61-14b166dc61ff" alt=""><figcaption></figcaption></figure>

### Transfer Reborn Embodied Vlog to Embodied Training Data

To process first-person perspective video data into hand landmarks for robotic dexterous hand training, a structured pipeline can be followed:

#### 1. **Video Preprocessing**

* **Frame Extraction:** Convert video into a sequence of frames to analyze each individually. Select a suitable frame rate (e.g., 30 FPS) to balance detail and computational cost.
* **Image Enhancement:** Improve video quality (e.g., brightness, contrast) to ensure clear visualization of hands and objects in various lighting conditions.
* **Segmentation:** Use a hand segmentation algorithm to isolate the hand region from the background, reducing noise and focusing the analysis.

#### 2. **Hand Landmark Detection**

* **Pose Estimation Models:** Utilize state-of-the-art hand pose estimation models, such as Mediapipe Hand Tracking or DeepHand, to detect keypoints on the hand.
  * Detect key landmarks such as fingertips, knuckles, wrist, and palm center.
  * Use 2D/3D coordinate extraction to map hand keypoints relative to the frame or environment.
* **3D Reconstruction (if needed):** Use stereo cameras or infer depth from monocular video using advanced models like DensePose or MANO (Model-based Articulated Hand Object).
