创建 LeRobot 2.1 数据集 • Axi's Blog

前言#

因为对于 LeRobot 数据集最近包括了大量的使用需求，但是寻找许久之后都没有非常清晰的文档或者教程，在这里给出最简单的教程，来解释如何创建 LeRobot 2.1 数据集。

安装#

首先，我们需要安装 LeRobot 2.1 的库，这里我们使用 pip 来安装：

pip install "lerobot @ git+https://github.com/huggingface/lerobot.git@2b71789e15c35418b1ccecbceb81f4a598bfd883"

bash

同时假如没有下载过，需要安装 ffmpeg

sudo apt update
sudo apt install ffmpeg

bash

创建数据集#

LeRobot 数据集一共包括两个环节，也不难理解，分别是创建数据集以及将每一个数据都存进去。

from lerobot.common.datasets.lerobot_dataset import HF_LEROBOT_HOME, LeRobotDataset

def create_dataset(
    repo_id: str,
    robot_type: str,
    mode: Literal["video", "image"] = "video",
) -> LeRobotDataset:
    motors = [
        "joint_0",
        "joint_1",
        "joint_2",
        "joint_3",
        "joint_4",
        "joint_5",
        "joint_6",
    ]
    cameras = [
        "camera_0",
        "camera_1",
        "camera_2",
    ]

    features = {
        "state.joints": {
            "dtype": "float32",
            "shape": (len(motors),),
            "names": [
                motors,
            ],
        },
        "action.joints": {
            "dtype": "float32",
            "shape": (len(motors),),
            "names": [
                motors,
            ],
        },
    }
    for cam in cameras:
        features[f"video.{cam}_view"] = {
            "dtype": mode,
            "shape": (3, 480, 640),  # (channels, height, width)
            "names": [
                "channels",
                "height",
                "width",
            ],
        }

    if Path(HF_LEROBOT_HOME / repo_id).exists():
        shutil.rmtree(HF_LEROBOT_HOME / repo_id)

    return LeRobotDataset.create(
        repo_id=repo_id,
        fps=15,
        robot_type=robot_type,
        features=features,
        use_videos=True,
        tolerance_s=0.0001,
        image_writer_processes=10,
        image_writer_threads=5,
        video_backend="ffmpeg",
    )

python

在这里我们使用将图像保存为视频，体现在输入的传参中，将 mode 设置为 "video"。如上的代码中定义了 joint 以及 camera 的基础信息。

本身对于 LeRobot 数据集的定义，主要是通过 features 来定义的，在此之后，features 就会被定义为一个 Dict，并且直接将其中存储数据即可。

与此同时，需要定义环境变量 export HF_LEROBOT_HOME=/path/to/your/lerobot/home，来设置 LeRobot 数据集的本地存储路径。

之后存储数据：

def process_single_dataset(
    dataset: LeRobotDataset,
    state_joints: np.ndarray,
    action_joints: np.ndarray,
    video_dict: dict[str, np.ndarray],
    instruction: str,
) -> LeRobotDataset:
    num_frames = state_joints.shape[0]

    for i in range(num_frames):
        frame = {
            "state.joints": state_joints[i],
            "action.joints": action_joints[i],
        }
        for camera, img_array in video_dict.items():
            frame[f"video.{camera}_view"] = img_array[i]
        dataset.add_frame(frame, task=instruction)

    dataset.save_episode()

    return dataset

python

在这里，只要使用同一个 dataset，LeRobot 会自动管理诸如 id 以及其他的统计信息。

使用最新的 LeRobotDataset#

同时在这里给出最新的 LeRobotDataset 的代码，还是先安装：

pip install lerobot

bash

之后对于创建数据集，直接：

DROID_FEATURES = {
    "observation.state.joint_position": {
        "dtype": "float32",
        "shape": (7,),
        "names": {
            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6"],
        },
    },
    "observation.state.gripper_position": {
        "dtype": "float32",
        "shape": (1,),
        "names": {
            "axes": ["gripper"],
        },
    },
    "observation.state": {
        "dtype": "float32",
        "shape": (8,),
        "names": {
            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6", "gripper"],
        },
    },
    "observation.images.wrist_left": {
        "dtype": "video",
        "shape": (180, 320, 3),
        "names": [
            "height",
            "width",
            "channels",
        ],
    },
    "language_instruction": {
        "dtype": "string",
        "shape": (1,),
        "names": None,
    },
    "action.joint_position": {
        "dtype": "float32",
        "shape": (7,),
        "names": {
            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6"],
        },
    },
    "action.gripper_position": {
        "dtype": "float32",
        "shape": (1,),
        "names": {
            "axes": ["gripper"],
        },
    },
    "action": {
        "dtype": "float32",
        "shape": (8,),
        "names": {
            "axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6", "gripper"],
        },
    },
    "is_episode_successful": {
        "dtype": "bool",
        "shape": (1,),
        "names": None,
    },
}
dataset = LeRobotDataset.create(
    repo_id="your-repo-id",
    features=DROID_FEATURES,
    fps=15,
    robot_type="Franka",
)

python

在这里给出三种不同的 feature 类型。对于 str 或者 bool，其 shape 为 (1,)，而 name 可以为 None；而对于如 joint 等一维数组，其 name 中设置了 axes 来表示每个位置的名称；而对于如 video 等二维数组，其 name 中设置了 height、width、channels 来表示每个维度的名称，同时指定 dtype 为 video，以使用视频进行编码。

对于添加每一帧，则为：

def generate_lerobot_frames(tf_episode):
    m = tf_episode["episode_metadata"]
    frame_meta = {
        "is_episode_successful": np.array([is_episode_successful(m)]),
    }
    for f in tf_episode["steps"]:
        frame = {
            "language_instruction": f["language_instruction"].numpy().decode(),
            "observation.state.joint_position": f["observation"]["joint_position"].numpy(),
            "observation.state.gripper_position": f["observation"]["gripper_position"].numpy(),
            "action.gripper_position": f["action_dict"]["gripper_position"].numpy(),
            "action.joint_position": f["action_dict"]["joint_position"].numpy(),
            "observation.images.wrist_left": f["observation"]["wrist_image_left"].numpy(),
        }

        # language_instruction is also stored as "task" to follow LeRobot standard
        frame["task"] = frame["language_instruction"]

        # Add this new feature to follow LeRobot standard of using joint position + gripper
        frame["observation.state"] = np.concatenate(
            [frame["observation.state.joint_position"], frame["observation.state.gripper_position"]]
        )
        frame["action"] = np.concatenate([frame["action.joint_position"], frame["action.gripper_position"]])

        # Meta data that are the same for all frames in the episode
        frame.update(frame_meta)

        # Cast fp64 to fp32
        for key in frame:
            if isinstance(frame[key], np.ndarray) and frame[key].dtype == np.float64:
                frame[key] = frame[key].astype(np.float32)

        yield frame

python

这其中可以看到 LeRobot 标准强行定义了三个 Feature，分别是 language_instruction、observation.state 以及 action。状态与动作都是 joint+gripper 的组合。

最后结合起来即可：

for episode in raw_dataset:
    for frame in generate_lerobot_frames(episode):
        lerobot_dataset.add_frame(frame)
    lerobot_dataset.save_episode()
    logging.info("Save_episode")
lerobot_dataset.finalize()

python

小结#

LeRobot 数据集总体来说依然是比较好用的，在转录数据的时候，只需要从 Features 的角度入手即可，同时支持对于 Image 自动进行 Video 的编码，整体来说还是非常方便的。