For what purpose was the dataset created?

The 3 datasets we presented in this paper were created for representation learning research. PUG: Animals is a strong dataset for OOD research as well as being able to better probe the representation of vision models. PUG: ImageNet was designed as an additional benchmark for ImageNet pretrained model to offer a better comprehension of vision models capabilities in term of robustness to specific factor of variations. Lastly, PUG: SPAR showcase how synthetic data can be used to evaluate VLMs understansing.

Who created the dataset and on behalf of which entity? Who funded the creation of the dataset?

These datasets were created by the FAIR team at Meta AI. It was entirely funded by Meta.

What do the instances that comprise the dataset represent?

The instances represent images of animals in various environment for PUG: Animals. In contrast PUG: ImageNet contains 151 object classes (the full list is available in the research paper appendix). PUG: SPAR uses the sames objects as PUG: Animal.

How many instances are there in total?

PUG: Animals: 215 040 images.

PUG: ImageNet: 88,328 images.

PUG: SPAR: 43,560 images.

Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?

PUG: Animals contains all possible combination of factors of variations. In contrast, PUG: ImageNet was sampled by changing only 1 factor at a time and is therefore a random sample of the distribution. Images in PUG: SPAR were sampled using all possible combination of factors of variations (with the exception that for the attributes the blue or grass animal is always on the left).

What data does each instance consist of ?

For PUG: Animals and PUG: ImageNet, we release images. For PUG: SPAR, we release the script to generate the captions from the factors of variations.

Is there a label or target associated with each instance?

Yes, a csv file. Each instance have a row in this csv files with all the factors of variation used to generate this image. For PUG: Animals, the csv files contains the following columns: filename, world_name, character_name, character_scale, camera_yaw, character_texture while for PUG: ImageNet, it contains: filename, world_name, character_name, character_label, character_rotation_yaw, character_rotation_roll, character_rotation_pitch, character_scale, camera_roll, camera_pitch, camera_yaw, character_texture, scene_light.
For PUG: SPAR, the csv contains: filename, world_name, character_name, character2_name, character1_pos, character2_pos, character_texture, character2_texture

There is no specific split concerning PUG: Animals because this dataset should be used for OOD research. We primarily let the researchers choose their own held out or training/validation/testing split to train their models. In contrast, PUG:ImageNet and PUG:SPAR should only be used as an additional test set.

Are there any errors, sources of noise, or redundancies in the dataset?

PUG: Animals and PUG:SPAR are very clean and each animal is easily identifiable. In contrast, PUG:ImageNet leverage assets from Sketchfab and the asset quality vary significantly.

The dataset is self-contained however the assets that were used to build the dataset belongs to external sources which are listed on the github.

Does the dataset contain data that, if viewed directly, might be offensive, insulting, threatening, or might otherwise cause anxiety?


How was the data associated with each instance acquired?

The data (3D assets) were purchased through the Unreal Engine Marketplace and Sketchfab. Assets were then incorporated into the Unreal Engine and generate realistic 3D scenes and corresponding images. The 3D assets were manually selected to ensure high quality. For a complete list of assets used, see here.

What mechanisms or procedures were used to collect the data?

Assets were manually collected.

If the dataset is a sample from a larger set, what was the sampling strategy?

For PUG: Animals and PUG: SPAR all combinations are included. For PUG: ImageNet, a random sample of possible combinations is provided.

Who was involved in the data collection process and how were they compensated?

Only the authors of this work were involved.

Over what timeframe was the data collected?

The data were collected between June 2022 and June 2023.

Does these datasets were made using 3D assets that were marked with a NoAI tag ?

No, we carefuly excluded any of the assets which had a NoAI tag on them at the time of the data collection. The purpose of these datasets is to probe the robustness to visions models and they should not be used for any generative AI purposes as mentioned in the dataset respective licenses.

How can I make a PUG dataset ?

Please take a look at our tutorial which explains how to use Unreal Engine to create interactive environments.

Where can I find more information about the datasets ?

Please take a look at the research paper which contains the complete datasheet in the appendix.