# VITRA **Repository Path**: mirrors_microsoft/VITRA ## Basic Information - **Project Name**: VITRA - **Description**: VITRA: Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-27 - **Last Updated**: 2025-11-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

VITRA:
Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

arXiv Project Page

VITRA is a novel approach for pretraining robotic manipulation Vision-Language-Action (VLA) models using a large corpus of unscripted real-life video recordings of human hand activities. Treating human hand as dexterous robot end-effector, we show that "in-the-wild" egocentric human videos without any annotations can be transformed into data formats fully aligned with existing robotic V-L-A training data in terms of task granularity and labels. ## ⏩ Updates **Data, code, and pretrained models will be released no later than 11/30/2025. Please stay tuned.**