Data Repositories

Last but not least: Data!

GitHub is the right home for your code — but it is not built for data.

❌ GitHub has a 100 MB file size limit

❌ Large files bloat your repository history

❌ Raw data should not be version-controlled alongside code

❌ GitHub repos can be deleted — DOIs from data repositories cannot

NoteRaw data belongs in a data repository — a permanent, citable, indexed home designed for storing and sharing research outputs.

Discipline-specific repositories

Repository Best for Discipline
PRIDE Proteomics (mass spec) Proteomics
GEO Gene expression data Genomics
SRA Raw sequencing data Genomics
BioImage Archive Biological imaging (microscopy, EM) Imaging
IDR Image data + experimental metadata Imaging
TCIA Clinical & pre-clinical imaging Clinical imaging
OpenNeuro Neuroimaging (MRI, EEG, MEG) Neuroimaging
PhysioNet Physiological & clinical time-series Clinical / sports science
PANGAEA Environmental & Earth science spatial Spatial / environmental
GBIF Biodiversity & occurrence data Spatial / ecology
TipCan’t find a match? re3data.org indexes 2,000+ discipline-specific repositories — search by subject, data type, or country.

General options

Repository Best for Discipline
Zenodo Any research output General
Figshare Figures, datasets, code General
OSF Full project + preprints General
TipWhen in doubt, Zenodo or Figshare accept almost anything — but many journals mandate a discipline-specific repository. Always check first.

In Practice

CautionExercises

4.12 For the spacewalk manuscript, identify the raw data that should be shared and the most appropriate repository for it.

4.13 Create a draft record for this data in the repository (no need to upload files or make it public).

4.14 Think about the data types you generate or work with in your own research. Identify the standard or recommended repository for each data type.