Public Datasets Repository
Last updated
Last updated
A read-only collection of sample datasets datasets are provided for free for use within Gradient.
For Notebooks, they are available in the directory /datasets
, e.g., /datasets/mnist
.
For Workflows, they are in the Gradient namespace, e.g., in YAML, ref: gradient/mnist
.
Name & Path
Description
Fast.ai
/datasets/fastai/
ref: gradient/fastai
Paperspace's Fast.ai template is built for getting up and running with the enormously popular Fast.ai online MOOC called Practical Deep Learning for Coders.
Source: (previously )
LSUN
/datasets/lsun/
ref: gradient/lsun
Contains around one million labeled images for each of 10 scene categories and 20 object categories.
Source:
(was http://lsun.cs.princeton.edu/2017; link no longer active)
MNIST
/datasets/mnist/
ref: gradient/mnist
The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples
Source:
COCO
/datasets/coco
ref: gradient/coco
COCO is a large-scale object detection, segmentation, and captioning dataset.
Source:
Selfie
/datasets/selfie
ref: gradient/selfie
Selfie dataset contains 46,836 selfie images annotated with 36 different attributes divided into several categories.
Source:
(was http://crcv.ucf.edu/data/Selfie )
StyleGAN
/datasets/stylegan
StyleGAN is a Style-Based Generator Architecture for Generative Adversarial Networks. This dataset allows for photographs of people to be produced by the generator.
Source:
OpenSLR
/datasets/openslr
ref: gradient/openslr
Open Speech and Language Resources. This is dataset number 12, the LibriSpeech ASR corpus.
Source:
Self Driving Demo
/datasets/self-driving-demo-data
A dataset by comma.ai that includes over 33 hours of commute on California's I280 freeway.
Source:
Sentiment140
/datasets/sentiment140
ref: gradient/sentiment140
Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter.
Source:
Tiny-imagenet-200
/datasets/tiny-imagenet-200
ref: gradient/tiny-imagenet-200
A subset of the ImageNET dataset created by the Stanford CS231n course. It spans 200 image classes with 500 training examples per class. It also has 50 validation and 50 test examples per class.
Source: