Axi's Blog
创建 LeRobot 2.1 数据集Blur image

前言#

因为对于 LeRobot 数据集最近包括了大量的使用需求,但是寻找许久之后都没有非常清晰的文档或者教程,在这里给出最简单的教程,来解释如何创建 LeRobot 2.1 数据集。

安装#

首先,我们需要安装 LeRobot 2.1 的库,这里我们使用 pip 来安装:

pip install "lerobot @ git+https://github.com/huggingface/lerobot.git@2b71789e15c35418b1ccecbceb81f4a598bfd883"
bash

同时假如没有下载过,需要安装 ffmpeg

sudo apt update
sudo apt install ffmpeg
bash

创建数据集#

LeRobot 数据集一共包括两个环节,也不难理解,分别是创建数据集以及将每一个数据都存进去。

from lerobot.common.datasets.lerobot_dataset import HF_LEROBOT_HOME, LeRobotDataset

def create_dataset(
    repo_id: str,
    robot_type: str,
    mode: Literal["video", "image"] = "video",
) -> LeRobotDataset:
    motors = [
        "joint_0",
        "joint_1",
        "joint_2",
        "joint_3",
        "joint_4",
        "joint_5",
        "joint_6",
    ]
    cameras = [
        "camera_0",
        "camera_1",
        "camera_2",
    ]

    features = {
        "state.joints": {
            "dtype": "float32",
            "shape": (len(motors),),
            "names": [
                motors,
            ],
        },
        "action.joints": {
            "dtype": "float32",
            "shape": (len(motors),),
            "names": [
                motors,
            ],
        },
    }
    for cam in cameras:
        features[f"video.{cam}_view"] = {
            "dtype": mode,
            "shape": (3, 480, 640),  # (channels, height, width)
            "names": [
                "channels",
                "height",
                "width",
            ],
        }

    if Path(HF_LEROBOT_HOME / repo_id).exists():
        shutil.rmtree(HF_LEROBOT_HOME / repo_id)

    return LeRobotDataset.create(
        repo_id=repo_id,
        fps=15,
        robot_type=robot_type,
        features=features,
        use_videos=True,
        tolerance_s=0.0001,
        image_writer_processes=10,
        image_writer_threads=5,
        video_backend="ffmpeg",
    )
python

在这里我们使用将图像保存为视频,体现在输入的传参中,将 mode 设置为 "video"。如上的代码中定义了 joint 以及 camera 的基础信息。

本身对于 LeRobot 数据集的定义,主要是通过 features 来定义的,在此之后,features 就会被定义为一个 Dict,并且直接将其中存储数据即可。

与此同时,需要定义环境变量 export HF_LEROBOT_HOME=/path/to/your/lerobot/home,来设置 LeRobot 数据集的本地存储路径。

之后存储数据:

def process_single_dataset(
    dataset: LeRobotDataset,
    state_joints: np.ndarray,
    action_joints: np.ndarray,
    video_dict: dict[str, np.ndarray],
    instruction: str,
) -> LeRobotDataset:
    num_frames = state_joints.shape[0]

    for i in range(num_frames):
        frame = {
            "state.joints": state_joints[i],
            "action.joints": action_joints[i],
        }
        for camera, img_array in video_dict.items():
            frame[f"video.{camera}_view"] = img_array[i]
        dataset.add_frame(frame, task=instruction)

    dataset.save_episode()

    return dataset
python

在这里,只要使用同一个 dataset,LeRobot 会自动管理诸如 id 以及其他的统计信息。

使用最新的 LeRobotDataset#

同时在这里给出最新的 LeRobotDataset 的代码,还是先安装:

pip install lerobot
bash

之后对于创建数据集,直接:

DROID_FEATURES = {
    "observation.state.joint_position": {
        "dtype": "float32",
        "shape": (7,),
        "names": {
            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6"],
        },
    },
    "observation.state.gripper_position": {
        "dtype": "float32",
        "shape": (1,),
        "names": {
            "axes": ["gripper"],
        },
    },
    "observation.state": {
        "dtype": "float32",
        "shape": (8,),
        "names": {
            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6", "gripper"],
        },
    },
    "observation.images.wrist_left": {
        "dtype": "video",
        "shape": (180, 320, 3),
        "names": [
            "height",
            "width",
            "channels",
        ],
    },
    "language_instruction": {
        "dtype": "string",
        "shape": (1,),
        "names": None,
    },
    "action.joint_position": {
        "dtype": "float32",
        "shape": (7,),
        "names": {
            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6"],
        },
    },
    "action.gripper_position": {
        "dtype": "float32",
        "shape": (1,),
        "names": {
            "axes": ["gripper"],
        },
    },
    "action": {
        "dtype": "float32",
        "shape": (8,),
        "names": {
            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6", "gripper"],
        },
    },
    "is_episode_successful": {
        "dtype": "bool",
        "shape": (1,),
        "names": None,
    },
}
dataset = LeRobotDataset.create(
    repo_id="your-repo-id",
    features=DROID_FEATURES,
    fps=15,
    robot_type="Franka",
)
python

在这里给出三种不同的 feature 类型。对于 str 或者 bool,其 shape 为 (1,),而 name 可以为 None;而对于如 joint 等一维数组,其 name 中设置了 axes 来表示每个位置的名称;而对于如 video 等二维数组,其 name 中设置了 height、width、channels 来表示每个维度的名称,同时指定 dtype 为 video,以使用视频进行编码。

对于添加每一帧,则为:

def generate_lerobot_frames(tf_episode):
    m = tf_episode["episode_metadata"]
    frame_meta = {
        "is_episode_successful": np.array([is_episode_successful(m)]),
    }
    for f in tf_episode["steps"]:
        frame = {
            "language_instruction": f["language_instruction"].numpy().decode(),
            "observation.state.joint_position": f["observation"]["joint_position"].numpy(),
            "observation.state.gripper_position": f["observation"]["gripper_position"].numpy(),
            "action.gripper_position": f["action_dict"]["gripper_position"].numpy(),
            "action.joint_position": f["action_dict"]["joint_position"].numpy(),
            "observation.images.wrist_left": f["observation"]["wrist_image_left"].numpy(),
        }

        # language_instruction is also stored as "task" to follow LeRobot standard
        frame["task"] = frame["language_instruction"]

        # Add this new feature to follow LeRobot standard of using joint position + gripper
        frame["observation.state"] = np.concatenate(
            [frame["observation.state.joint_position"], frame["observation.state.gripper_position"]]
        )
        frame["action"] = np.concatenate([frame["action.joint_position"], frame["action.gripper_position"]])

        # Meta data that are the same for all frames in the episode
        frame.update(frame_meta)

        # Cast fp64 to fp32
        for key in frame:
            if isinstance(frame[key], np.ndarray) and frame[key].dtype == np.float64:
                frame[key] = frame[key].astype(np.float32)

        yield frame
python

这其中可以看到 LeRobot 标准强行定义了三个 Feature,分别是 language_instructionobservation.state 以及 action。状态与动作都是 joint+gripper 的组合。

最后结合起来即可:

for episode in raw_dataset:
    for frame in generate_lerobot_frames(episode):
        lerobot_dataset.add_frame(frame)
    lerobot_dataset.save_episode()
    logging.info("Save_episode")
lerobot_dataset.finalize()
python

小结#

LeRobot 数据集总体来说依然是比较好用的,在转录数据的时候,只需要从 Features 的角度入手即可,同时支持对于 Image 自动进行 Video 的编码,整体来说还是非常方便的。

创建 LeRobot 2.1 数据集
https://axi404.top/blog/lerobot
Author 阿汐
Published at October 29, 2025
Comment seems to stuck. Try to refresh?✨