The 3 datasets we presented in this paper were created for representation learning research. PUG: Animals is a strong dataset for OOD research as well as being able to better probe the representation of vision models. PUG: ImageNet was designed as an additional benchmark for ImageNet pretrained model to offer a better comprehension of vision models capabilities in term of robustness to specific factor of variations. Lastly, PUG: SPAR showcase how synthetic data can be used to evaluate VLMs understansing.
These datasets were created by the FAIR team at Meta AI. It was entirely funded by Meta.
The instances represent images of animals in various environment for PUG: Animals. In contrast PUG: ImageNet contains 151 object classes (the full list is available in the research paper appendix). PUG: SPAR uses the sames objects as PUG: Animal.
PUG: Animals: 215 040 images.
PUG: ImageNet: 88,328 images.
PUG: SPAR: 43,560 images.
PUG: Animals contains all possible combination of factors of variations. In contrast, PUG: ImageNet was sampled by changing only 1 factor at a time and is therefore a random sample of the distribution. Images in PUG: SPAR were sampled using all possible combination of factors of variations (with the exception that for the attributes the blue or grass animal is always on the left).
For PUG: Animals and PUG: ImageNet, we release images. For PUG: SPAR, we release the script to generate the captions from the factors of variations.
Yes, a csv file. Each instance have a row in this csv files with all the factors of variation used to generate this image. For PUG: Animals, the csv files contains the following columns:
filename, world_name, character_name, character_scale, camera_yaw, character_texture
while for PUG: ImageNet, it contains:
filename, world_name, character_name, character_label, character_rotation_yaw,
character_rotation_roll, character_rotation_pitch, character_scale, camera_roll,
camera_pitch, camera_yaw, character_texture,
For PUG: SPAR, the csv contains: filename, world_name, character_name, character2_name, character1_pos, character2_pos, character_texture, character2_texture
There is no specific split concerning PUG: Animals because this dataset should be used for OOD research. We primarily let the researchers choose their own held out or training/validation/testing split to train their models. In contrast, PUG:ImageNet and PUG:SPAR should only be used as an additional test set.
PUG: Animals and PUG:SPAR are very clean and each animal is easily identifiable. In contrast, PUG:ImageNet leverage assets from Sketchfab and the asset quality vary significantly.
The dataset is self-contained however the assets that were used to build the dataset belongs to external sources which are listed on the github.
The data (3D assets) were purchased through the Unreal Engine Marketplace and Sketchfab. Assets were then incorporated into the Unreal Engine and generate realistic 3D scenes and corresponding images. The 3D assets were manually selected to ensure high quality. For a complete list of assets used, see here.
Assets were manually collected.
For PUG: Animals and PUG: SPAR all combinations are included. For PUG: ImageNet, a random sample of possible combinations is provided.
Only the authors of this work were involved.
The data were collected between June 2022 and June 2023.
No, we carefuly excluded any of the assets which had a NoAI tag on them at the time of the data collection. The purpose of these datasets is to probe the robustness to visions models and they should not be used for any generative AI purposes as mentioned in the dataset respective licenses.
Please take a look at our tutorial which explains how to use Unreal Engine to create interactive environments.
Please take a look at the research paper which contains the complete datasheet in the appendix.