Data Repositories
Last but not least: Data!
GitHub is the right home for your code — but it is not built for data.
❌ GitHub has a 100 MB file size limit
❌ Large files bloat your repository history
❌ Raw data should not be version-controlled alongside code
❌ GitHub repos can be deleted — DOIs from data repositories cannot
NoteRaw data belongs in a data repository — a permanent, citable, indexed home designed for storing and sharing research outputs.
Discipline-specific repositories
| Repository | Best for | Discipline |
|---|---|---|
| PRIDE | Proteomics (mass spec) | Proteomics |
| GEO | Gene expression data | Genomics |
| SRA | Raw sequencing data | Genomics |
| BioImage Archive | Biological imaging (microscopy, EM) | Imaging |
| IDR | Image data + experimental metadata | Imaging |
| TCIA | Clinical & pre-clinical imaging | Clinical imaging |
| OpenNeuro | Neuroimaging (MRI, EEG, MEG) | Neuroimaging |
| PhysioNet | Physiological & clinical time-series | Clinical / sports science |
| PANGAEA | Environmental & Earth science spatial | Spatial / environmental |
| GBIF | Biodiversity & occurrence data | Spatial / ecology |
TipCan’t find a match? re3data.org indexes 2,000+ discipline-specific repositories — search by subject, data type, or country.
General options
| Repository | Best for | Discipline |
|---|---|---|
| Zenodo | Any research output | General |
| Figshare | Figures, datasets, code | General |
| OSF | Full project + preprints | General |
TipWhen in doubt, Zenodo or Figshare accept almost anything — but many journals mandate a discipline-specific repository. Always check first.
In Practice
CautionExercises
4.12 For the spacewalk manuscript, identify the raw data that should be shared and the most appropriate repository for it.
4.13 Create a draft record for this data in the repository (no need to upload files or make it public).
4.14 Think about the data types you generate or work with in your own research. Identify the standard or recommended repository for each data type.