

创建 LeRobot 2.1 数据集
LeRobot 2.1 是目前最流行的数据集格式,创建 LeRobot 数据集的示例并不直观,在这里给出一些简单的指引。
前言#
因为对于 LeRobot 数据集最近包括了大量的使用需求,但是寻找许久之后都没有非常清晰的文档或者教程,在这里给出最简单的教程,来解释如何创建 LeRobot 2.1 数据集。
安装#
首先,我们需要安装 LeRobot 2.1 的库,这里我们使用 pip 来安装:
pip install "lerobot @ git+https://github.com/huggingface/lerobot.git@2b71789e15c35418b1ccecbceb81f4a598bfd883"bash同时假如没有下载过,需要安装 ffmpeg
sudo apt update
sudo apt install ffmpegbash创建数据集#
LeRobot 数据集一共包括两个环节,也不难理解,分别是创建数据集以及将每一个数据都存进去。
from lerobot.common.datasets.lerobot_dataset import HF_LEROBOT_HOME, LeRobotDataset
def create_dataset(
repo_id: str,
robot_type: str,
mode: Literal["video", "image"] = "video",
) -> LeRobotDataset:
motors = [
"joint_0",
"joint_1",
"joint_2",
"joint_3",
"joint_4",
"joint_5",
"joint_6",
]
cameras = [
"camera_0",
"camera_1",
"camera_2",
]
features = {
"state.joints": {
"dtype": "float32",
"shape": (len(motors),),
"names": [
motors,
],
},
"action.joints": {
"dtype": "float32",
"shape": (len(motors),),
"names": [
motors,
],
},
}
for cam in cameras:
features[f"video.{cam}_view"] = {
"dtype": mode,
"shape": (3, 480, 640), # (channels, height, width)
"names": [
"channels",
"height",
"width",
],
}
if Path(HF_LEROBOT_HOME / repo_id).exists():
shutil.rmtree(HF_LEROBOT_HOME / repo_id)
return LeRobotDataset.create(
repo_id=repo_id,
fps=15,
robot_type=robot_type,
features=features,
use_videos=True,
tolerance_s=0.0001,
image_writer_processes=10,
image_writer_threads=5,
video_backend="ffmpeg",
)python在这里我们使用将图像保存为视频,体现在输入的传参中,将 mode 设置为 "video"。如上的代码中定义了 joint 以及 camera 的基础信息。
本身对于 LeRobot 数据集的定义,主要是通过 features 来定义的,在此之后,features 就会被定义为一个 Dict,并且直接将其中存储数据即可。
与此同时,需要定义环境变量 export HF_LEROBOT_HOME=/path/to/your/lerobot/home,来设置 LeRobot 数据集的本地存储路径。
之后存储数据:
def process_single_dataset(
dataset: LeRobotDataset,
state_joints: np.ndarray,
action_joints: np.ndarray,
video_dict: dict[str, np.ndarray],
instruction: str,
) -> LeRobotDataset:
num_frames = state_joints.shape[0]
for i in range(num_frames):
frame = {
"state.joints": state_joints[i],
"action.joints": action_joints[i],
}
for camera, img_array in video_dict.items():
frame[f"video.{camera}_view"] = img_array[i]
dataset.add_frame(frame, task=instruction)
dataset.save_episode()
return datasetpython在这里,只要使用同一个 dataset,LeRobot 会自动管理诸如 id 以及其他的统计信息。
使用最新的 LeRobotDataset#
同时在这里给出最新的 LeRobotDataset 的代码,还是先安装:
pip install lerobotbash之后对于创建数据集,直接:
DROID_FEATURES = {
"observation.state.joint_position": {
"dtype": "float32",
"shape": (7,),
"names": {
"axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6"],
},
},
"observation.state.gripper_position": {
"dtype": "float32",
"shape": (1,),
"names": {
"axes": ["gripper"],
},
},
"observation.state": {
"dtype": "float32",
"shape": (8,),
"names": {
"axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6", "gripper"],
},
},
"observation.images.wrist_left": {
"dtype": "video",
"shape": (180, 320, 3),
"names": [
"height",
"width",
"channels",
],
},
"language_instruction": {
"dtype": "string",
"shape": (1,),
"names": None,
},
"action.joint_position": {
"dtype": "float32",
"shape": (7,),
"names": {
"axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6"],
},
},
"action.gripper_position": {
"dtype": "float32",
"shape": (1,),
"names": {
"axes": ["gripper"],
},
},
"action": {
"dtype": "float32",
"shape": (8,),
"names": {
"axes": ["joint_0", "joint_1", "joint_2", "joint_3", "joint_4", "joint_5", "joint_6", "gripper"],
},
},
"is_episode_successful": {
"dtype": "bool",
"shape": (1,),
"names": None,
},
}
dataset = LeRobotDataset.create(
repo_id="your-repo-id",
features=DROID_FEATURES,
fps=15,
robot_type="Franka",
)python在这里给出三种不同的 feature 类型。对于 str 或者 bool,其 shape 为 (1,),而 name 可以为 None;而对于如 joint 等一维数组,其 name 中设置了 axes 来表示每个位置的名称;而对于如 video 等二维数组,其 name 中设置了 height、width、channels 来表示每个维度的名称,同时指定 dtype 为 video,以使用视频进行编码。
对于添加每一帧,则为:
def generate_lerobot_frames(tf_episode):
m = tf_episode["episode_metadata"]
frame_meta = {
"is_episode_successful": np.array([is_episode_successful(m)]),
}
for f in tf_episode["steps"]:
frame = {
"language_instruction": f["language_instruction"].numpy().decode(),
"observation.state.joint_position": f["observation"]["joint_position"].numpy(),
"observation.state.gripper_position": f["observation"]["gripper_position"].numpy(),
"action.gripper_position": f["action_dict"]["gripper_position"].numpy(),
"action.joint_position": f["action_dict"]["joint_position"].numpy(),
"observation.images.wrist_left": f["observation"]["wrist_image_left"].numpy(),
}
# language_instruction is also stored as "task" to follow LeRobot standard
frame["task"] = frame["language_instruction"]
# Add this new feature to follow LeRobot standard of using joint position + gripper
frame["observation.state"] = np.concatenate(
[frame["observation.state.joint_position"], frame["observation.state.gripper_position"]]
)
frame["action"] = np.concatenate([frame["action.joint_position"], frame["action.gripper_position"]])
# Meta data that are the same for all frames in the episode
frame.update(frame_meta)
# Cast fp64 to fp32
for key in frame:
if isinstance(frame[key], np.ndarray) and frame[key].dtype == np.float64:
frame[key] = frame[key].astype(np.float32)
yield framepython这其中可以看到 LeRobot 标准强行定义了三个 Feature,分别是 language_instruction、observation.state 以及 action。状态与动作都是 joint+gripper 的组合。
最后结合起来即可:
for episode in raw_dataset:
for frame in generate_lerobot_frames(episode):
lerobot_dataset.add_frame(frame)
lerobot_dataset.save_episode()
logging.info("Save_episode")
lerobot_dataset.finalize()python小结#
LeRobot 数据集总体来说依然是比较好用的,在转录数据的时候,只需要从 Features 的角度入手即可,同时支持对于 Image 自动进行 Video 的编码,整体来说还是非常方便的。