Gradient Docs
Gradient HomeHelp DeskCommunitySign up free
Gradient Next
Gradient Next
  • About Gradient
  • Get Started
    • Quick Start
      • Install the Gradient CLI
    • Core Concepts
    • Organizing Projects
      • Secrets
      • Storing an API key as a Secret
    • Tutorials
      • Gradient Notebooks Tutorial
      • Gradient Workflows Tutorial
      • Gradient Deployments Tutorial
    • FAQ
    • Common Errors
  • Gradient Platform
    • Gradient Notebooks
      • Runtimes
      • Files and storage
      • Machines
      • Terminal
      • Shortcuts
      • Sharing
      • TensorBoard
      • Run on Gradient
    • Gradient Workflows
      • Basic operations
      • Understanding Inputs & Outputs
      • Workflow Spec
      • Gradient Actions
      • Environment Variables
      • Using YAML for Data Science
    • Gradient Deployments
      • Basic operations
      • Deployment Spec
  • Artifacts
    • Container Management
      • Custom Containers
    • Data
      • Versioned Data
        • Public Datasets Repository
        • Storage Providers
      • Persistent Storage
    • Models
      • Managing Models
        • Model Types & Metadata
        • Public Models
    • Code
    • Metrics
      • Push Metrics
      • View & Query Metrics
  • Gradient Cluster
    • Overview
      • Setup
        • Managed Private Clusters
        • Self-Hosted Clusters
          • Pre-installation steps
          • Gradient Installer CLI
          • Terraform
            • Pre-installation steps
            • Install on AWS
            • Install on bare metal / VMs
            • Install on NVIDIA DGX
          • Let's Encrypt DNS Providers
          • Updating your cluster
      • Usage
  • More
    • SDK
      • Projects Client
      • Models Client
      • Deployments Client
      • Workflows Client
      • SDK Examples
      • Full SDK Reference
    • Machine Types
      • Machine Tiers
      • Free Machines (Free Tier)
    • Your Account
      • Teams
        • Creating a Team
        • Upgrading to a Team Plan
      • Hotkeys
      • Billing & Subscriptions
        • Storage Billing
      • Public Profiles
    • Release notes
    • Roadmap
Powered by GitBook
On this page
  1. Artifacts
  2. Data
  3. Versioned Data

Public Datasets Repository

PreviousVersioned DataNextStorage Providers

Last updated 3 years ago

A read-only collection of sample datasets datasets are provided for free for use within Gradient.

  • For Notebooks, they are available in the directory /datasets, e.g., /datasets/mnist.

  • For Workflows, they are in the Gradient namespace, e.g., in YAML, ref: gradient/mnist.

List of Public Datasets

Name & Path

Description

Fast.ai

/datasets/fastai/

ref: gradient/fastai

Paperspace's Fast.ai template is built for getting up and running with the enormously popular Fast.ai online MOOC called Practical Deep Learning for Coders.

Source: (previously )

LSUN

/datasets/lsun/

ref: gradient/lsun

Contains around one million labeled images for each of 10 scene categories and 20 object categories.

Source:

(was http://lsun.cs.princeton.edu/2017; link no longer active)

MNIST

/datasets/mnist/

ref: gradient/mnist

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples

Source:

COCO

/datasets/coco

ref: gradient/coco

COCO is a large-scale object detection, segmentation, and captioning dataset.

Source:

Selfie

/datasets/selfie

ref: gradient/selfie

Selfie dataset contains 46,836 selfie images annotated with 36 different attributes divided into several categories.

Source:

(was http://crcv.ucf.edu/data/Selfie )

StyleGAN

/datasets/stylegan

StyleGAN is a Style-Based Generator Architecture for Generative Adversarial Networks. This dataset allows for photographs of people to be produced by the generator.

Source:

OpenSLR

/datasets/openslr

ref: gradient/openslr

Open Speech and Language Resources. This is dataset number 12, the LibriSpeech ASR corpus.

Source:

Self Driving Demo

/datasets/self-driving-demo-data

A dataset by comma.ai that includes over 33 hours of commute on California's I280 freeway.

Source:

Sentiment140

/datasets/sentiment140

ref: gradient/sentiment140

Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter.

Source:

Tiny-imagenet-200

/datasets/tiny-imagenet-200

ref: gradient/tiny-imagenet-200

A subset of the ImageNET dataset created by the Stanford CS231n course. It spans 200 image classes with 500 training examples per class. It also has 50 validation and 50 test examples per class.

Source:

https://registry.opendata.aws/
http://files.fast.ai/data/
http://www.yf.io/p/lsun
http://yann.lecun.com/exdb/mnist/
http://cocodataset.org/
https://www.crcv.ucf.edu/data/Selfie/
https://github.com/NVlabs/stylegan
https://www.openslr.org/resources.php
https://github.com/commaai/comma2k19
http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip
http://cs231n.stanford.edu/tiny-imagenet-200.zip