HO-3D Refactored

HO-3D-R is a refactor of HO-3D for 6D object pose estimation. HO-3D is refactored into the BOP format and can be used with the existing toolbox to evaluate the performance of 6D object pose estimation. Masks used in the paper are also provided separately. Ground truth masks are provided in the original dataset.



A subset of the sequences in the HO-3D training set and one sequence from the evaluation set (for an additional object) are included, giving a total of 10 objects for evaluation. The selected training sequences correspond to the camera perspective that faces the human demonstrator. For objects that have multiple demonstrations, the same camera perspective is removed. For subsets consisting of one camera, the first sequence is chosen. The remaining training sequences are treated as the test split. In addition, the sequences in the evaluation subset that match the additional object are included for testing. A total of 10 sequences are used for training and 44 sequences for testing. Frames are removed for testing if the pose estimates are imprecise. This is determined by computing the depth discrepancy of each object pixel in the mask with the corresponding pixel in the rendered depth image of the object using the annotated pose. A pixel is invalid if the real and rendered depth value differs by more than 5mm. A frame is invalid if less than 97% of the visible pixels are invalid. Finally, 800 frames are randomly sampled for each test object.

The mapping between HO-3D and HO-3D-R sequences are provided here.

Research paper

If you found the dataset useful, please cite the following papers:

  author = {Patten, Timothy and Park, Kiru and Leitner, Markus and Wolfram, Kevin and Vincze, Markus}, 
  title = {Object Learning for 6D Pose Estimation and Grasping from Videos of In-hand Manipulation}, 
  booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (submitted)}, 
  year = {2021}


  author = {Hampali, Shreyas and Rad, Mahdi and Oberweger, Markus and Lepetit, Vincent},
  title = {HOnnotate: A method for 3D Annotation of Hand and Object Poses},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year = {2020},
  pages = {3196--3206}

Contact & credits

For any questions about HO-3D-R, please contact:

  • Tim Patten – email: patten@acin.tuwien.ac.at

Other credits:

  • Shreyas Hampali for sharing his data and code to prepare this refactor