A deeper look into the PUG datasets
The PUG: Animals dataset contains 215,040 pre-rendered images using 70 animal assets, 64 environments, 3 sizes, 4 textures, under 4 camera orientations. It was designed with the intent to create a dataset with variation factors available. Inspired by research on out-of-distribution generalization, PUG: Animals allows one to precisely control distribution shifts between training and testing which can provide better insight on how a deep neural network generalizes on held out variation factors.
The PUG: ImageNet dataset contains 88,328 pre-rendered images using 724 assets representing 151 ImageNet classes with 64 environments, 7 sizes, 9 textures, 18 different camera orientations, 18 different character orientations and 7 light intensities. In contrast to PUG: Animals, PUG: ImageNet was created by varying only a single factor at a time (which explains the lower number of images than PUG: Animals despite using more factors). The main purpose of this dataset is to provide a novel, useful benchmark, paralleling ImageNet, but for fine-grained evaluation of the robustness of image classifiers, along several factors of variation.
It contains 43,560 test samples, with image-caption pairs that evaluate VLMs scene and object recognition, as well as inter-object and object-attribute relationships respectively. We utilize scenes containing up to two objects in 4 unique spatial relationships and 4 different texture variations.