Public Datasets Repository

circle-exclamation

Jobs and notebooks have access to a read-only directory that is mounted at /datasets. This directory includes the following public datasets (with many more to come).

List of Public Datasets

Name & Path

Description

Fast.ai

/datasets/fastai/

Paperspace's Fast.ai template is built for getting up and running with the enormously popular Fast.ai online MOOC called Practical Deep Learning for Coders.

Source: http://files.fast.ai/data/arrow-up-right

CelebA

/datasets/celebA/

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations.

Source: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.htmlarrow-up-right

LSUN

/datasets/lsun/

Contains around one million labeled images for each of 10 scene categories and 20 object categories.

Source: http://lsun.cs.princeton.edu/2017/arrow-up-right

http://www.yf.io/p/lsunarrow-up-right

MNIST

/datasets/mnist/

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples

Source: http://yann.lecun.com/exdb/mnist/arrow-up-right

COCO

/datasets/coco

Selfie

/datasets/selfie

Selfie dataset contains 46,836 selfie images annotated with 36 different attributes divided into several categories.

Source: http://crcv.ucf.edu/data/Selfie/arrow-up-right

StyleGan

/datasets/stylegan

StyleGan is a Style-Based Generator Architecture for Generative Adversarial Networks. This dataset allows for photographs of people to be produced by the generator.

Source: https://github.com/NVlabs/styleganarrow-up-right

OpenSLR

/datasets/openslr

Open Speech and Language Resources.

Source: https://www.openslr.org/resources.phparrow-up-right

Self Driving Demo

/datasets/self-driving-demo-data

A dataset by comma.ai that includes over 33 hours of commute in California's 280 highway.

Source: https://github.com/commaai/comma2k19

Sentiment140

/datasets/sentiment140

Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter.

Source: http://cs.stanford.edu/people/alecmgo/trainingandtestdata.ziparrow-up-right

Tiny-imagenet-200

/datasets/tiny-imagenet-200

A subset of the ImageNET dataset created by the Stanford CS231n course. It spans 200 image classes with 500 training examples per class. It also has 50 validation and 50 test examples per class.

Source: http://cs231n.stanford.edu/tiny-imagenet-200.ziparrow-up-right

Last updated