New Projects: Difference between revisions

No edit summary
No edit summary
Line 10: Line 10:
**Git and GitHub** are the foundation of everything below. Git tracks changes to your files over time; GitHub hosts your repository online and enables collaboration.
**Git and GitHub** are the foundation of everything below. Git tracks changes to your files over time; GitHub hosts your repository online and enables collaboration.


- [Software Carpentry: Version Control with Git](https://swcarpentry.github.io/git-novice/) (start here if you're new)
* [Software Carpentry: Version Control with Git](https://swcarpentry.github.io/git-novice/) (start here if you're new)
- [GitHub Skills](https://skills.github.com/) (interactive, browser-based tutorials)
* [GitHub Skills](https://skills.github.com/) (interactive, browser-based tutorials)
- [Pro Git Book](https://git-scm.com/book/en/v2) (comprehensive reference)
* [Pro Git Book](https://git-scm.com/book/en/v2) (comprehensive reference)


**The command line** is needed for most of the setup steps below. You don't need to be an expert, but you should be comfortable navigating directories and running commands.
**The command line** is needed for most of the setup steps below. You don't need to be an expert, but you should be comfortable navigating directories and running commands.


- [Software Carpentry: The Unix Shell](http://swcarpentry.github.io/shell-novice/)
* [Software Carpentry: The Unix Shell](http://swcarpentry.github.io/shell-novice/)


**Python packaging and environments** are essential for making your code portable and reproducible.
**Python packaging and environments** are essential for making your code portable and reproducible.


- [The Good Research Code Handbook](https://goodresearch.dev/) (read the whole thing, it's short and very good)
* [The Good Research Code Handbook](https://goodresearch.dev/) (read the whole thing, it's short and very good)
- [Conda documentation](https://docs.conda.io/en/latest/)
* [Conda documentation](https://docs.conda.io/en/latest/)




# Step 1: Name Your Project and Create the Repository
== Step 1: Name Your Project and Create the Repository ==


Pick a short, descriptive name. This name will be used for the folder, the GitHub repository, and the installable Python package, so keep it lowercase with no spaces (use hyphens or underscores if needed). For example: `pupil-dynamics`, `pursuit-acceleration`, `meg-connectivity`.
Pick a short, descriptive name. This name will be used for the folder, the GitHub repository, and the installable Python package, so keep it lowercase with no spaces (use hyphens or underscores if needed). For example: `pupil-dynamics`, `pursuit-acceleration`, `meg-connectivity`.
Line 30: Line 30:
Go to [GitHub](https://github.com) and create a new repository:
Go to [GitHub](https://github.com) and create a new repository:


- Initialize it with a README and a `.gitignore` (select the Python template).
* Initialize it with a README and a `.gitignore` (select the Python template).
- Choose a licence. For open science, we recommend **MIT** for code-heavy projects or **CC-BY 4.0** for data/content-heavy projects.
* Choose a licence. For open science, we recommend **MIT** for code-heavy projects or **CC-BY 4.0** for data/content-heavy projects.
- Clone the repository to your local machine:
* Clone the repository to your local machine:


```bash
```bash
Line 41: Line 41:




# Step 2: Set Up the Directory Structure
== Step 2: Set Up the Directory Structure ==


Create the following folder structure inside your repository:
Create the following folder structure inside your repository:
Line 79: Line 79:




# Step 3: Set Up a Virtual Environment
== Step 3: Set Up a Virtual Environment ==


Every project gets its own conda environment. This ensures that your dependencies are documented and that the project can be reproduced on any machine.
Every project gets its own conda environment. This ensures that your dependencies are documented and that the project can be reproduced on any machine.
Line 106: Line 106:




# Step 4: Make Your Code Pip-Installable
== Step 4: Make Your Code Pip-Installable ==


This step avoids the mess of `sys.path` hacks and makes your modules importable from anywhere in the project. Create a minimal `setup.py` in the project root:
This step avoids the mess of `sys.path` hacks and makes your modules importable from anywhere in the project. Create a minimal `setup.py` in the project root:
Line 128: Line 128:




# Step 5: Configure `.gitignore` and Data Tracking
== Step 5: Configure `.gitignore` and Data Tracking ==


Not everything belongs in Git. Add the following to your `.gitignore`:
Not everything belongs in Git. Add the following to your `.gitignore`:
Line 153: Line 153:
**For large or sensitive data**, Git is not the right tool. You have two main options depending on your situation:
**For large or sensitive data**, Git is not the right tool. You have two main options depending on your situation:


- [DataLad](https://www.datalad.org/) (recommended for neuroscience): built on Git and git-annex, it version-controls arbitrarily large files and integrates with BIDS and OpenNeuro. See the [DataLad Handbook](https://handbook.datalad.org/) for a thorough tutorial.
* [DataLad](https://www.datalad.org/) (recommended for neuroscience): built on Git and git-annex, it version-controls arbitrarily large files and integrates with BIDS and OpenNeuro. See the [DataLad Handbook](https://handbook.datalad.org/) for a thorough tutorial.
- [DVC (Data Version Control)](https://dvc.org/): a lightweight alternative that works well with remote storage backends (S3, Google Drive, etc.).
* [DVC (Data Version Control)](https://dvc.org/): a lightweight alternative that works well with remote storage backends (S3, Google Drive, etc.).


If your data are small enough to share directly (< 50 MB of text-based files, e.g., behavioural CSVs), you can keep them in the Git repository and remove `data/` from `.gitignore`.
If your data are small enough to share directly (< 50 MB of text-based files, e.g., behavioural CSVs), you can keep them in the Git repository and remove `data/` from `.gitignore`.




# Step 6: Organize Your Data
== Step 6: Organize Your Data ==


Follow community standards for data organization wherever possible. For neuroimaging data (MRI, EEG, MEG), use the [Brain Imaging Data Structure (BIDS)](https://bids.neuroimaging.io/). For behavioural and psychophysics data, adopt a consistent naming convention:
Follow community standards for data organization wherever possible. For neuroimaging data (MRI, EEG, MEG), use the [Brain Imaging Data Structure (BIDS)](https://bids.neuroimaging.io/). For behavioural and psychophysics data, adopt a consistent naming convention:
Line 171: Line 171:




# Step 7: Keep an Electronic Lab Notebook
== Step 7: Keep an Electronic Lab Notebook ==


Use the `docs/labnotebook/` directory for dated Markdown entries. A simple naming convention works well:
Use the `docs/labnotebook/` directory for dated Markdown entries. A simple naming convention works well:
Line 185: Line 185:




# Step 8: Write a Good README
== Step 8: Write a Good README ==


Your `README.md` is the front door to the project. Write it early and update it as the project evolves. It should include:
Your `README.md` is the front door to the project. Write it early and update it as the project evolves. It should include:


- **Project title and one-paragraph summary** of the scientific question.
* **Project title and one-paragraph summary** of the scientific question.
- **How to set up the environment** (`conda env create --file environment.yml`).
* **How to set up the environment** (`conda env create --file environment.yml`).
- **How to reproduce the results** (which scripts to run, in what order).
* **How to reproduce the results** (which scripts to run, in what order).
- **Directory structure** (copy and paste the tree from Step 2 and annotate it).
* **Directory structure** (copy and paste the tree from Step 2 and annotate it).
- **Data availability** (where the data live if not in the repository).
* **Data availability** (where the data live if not in the repository).
- **Authors and contact information**.
* **Authors and contact information**.
- **Licence**.
* **Licence**.




# Step 9: Commit Early, Commit Often
== Step 9: Commit Early, Commit Often ==


A good rule of thumb is to commit every meaningful unit of work: a new analysis function, a cleaned dataset, a draft of a figure. Each commit should have a short, informative message. Aim for several commits per day when you are actively working.
A good rule of thumb is to commit every meaningful unit of work: a new analysis function, a cleaned dataset, a draft of a figure. Each commit should have a short, informative message. Aim for several commits per day when you are actively working.
Line 211: Line 211:




# Step 10: Prepare for Publication from Day One
== Step 10: Prepare for Publication from Day One ==


The reason we set all of this up at the start is so that sharing is effortless when the paper is ready. At publication time, you should be able to:
The reason we set all of this up at the start is so that sharing is effortless when the paper is ready. At publication time, you should be able to:
Line 222: Line 222:




# Quick-Reference Checklist
== Quick-Reference Checklist ==


When starting a new project, work through the following:
When starting a new project, work through the following:


- [ ] Create a GitHub repository with README, licence, and `.gitignore`
* [ ] Create a GitHub repository with README, licence, and `.gitignore`
- [ ] Clone it locally and set up the directory structure
* [ ] Clone it locally and set up the directory structure
- [ ] Create and export a conda environment
* [ ] Create and export a conda environment
- [ ] Create `setup.py` and `pip install -e .`
* [ ] Create `setup.py` and `pip install -e .`
- [ ] Configure `.gitignore` (and DataLad/DVC if needed for large data)
* [ ] Configure `.gitignore` (and DataLad/DVC if needed for large data)
- [ ] Write a `data/README.md` documenting your data conventions
* [ ] Write a `data/README.md` documenting your data conventions
- [ ] Start your lab notebook with a first entry
* [ ] Start your lab notebook with a first entry
- [ ] Write a draft `README.md` with setup and reproduction instructions
* [ ] Write a draft `README.md` with setup and reproduction instructions
- [ ] Make your first commit and push
* [ ] Make your first commit and push




# Further Reading
== Further Reading ==


- Mineault, P. J. (2021). [The Good Research Code Handbook](https://goodresearch.dev/). The essential guide to writing clean, maintainable research code.
* Mineault, P. J. (2021). [The Good Research Code Handbook](https://goodresearch.dev/). The essential guide to writing clean, maintainable research code.
- Wilson, G. et al. (2017). [Good enough practices in scientific computing](https://doi.org/10.1371/journal.pcbi.1005510). *PLOS Computational Biology*.
* Wilson, G. et al. (2017). [Good enough practices in scientific computing](https://doi.org/10.1371/journal.pcbi.1005510). *PLOS Computational Biology*.
- Gorgolewski, K. J. et al. (2016). [The Brain Imaging Data Structure](https://doi.org/10.1038/sdata201644). *Scientific Data*.
* Gorgolewski, K. J. et al. (2016). [The Brain Imaging Data Structure](https://doi.org/10.1038/sdata201644). *Scientific Data*.
- Halchenko, Y. O. et al. (2021). [DataLad: distributed system for joint management of code, data, and their relationship](https://doi.org/10.21105/joss.03262). *Journal of Open Source Software*.
* Halchenko, Y. O. et al. (2021). [DataLad: distributed system for joint management of code, data, and their relationship](https://doi.org/10.21105/joss.03262). *Journal of Open Source Software*.
- The [DataLad Handbook](https://handbook.datalad.org/) for a comprehensive tutorial on data version control.
* The [DataLad Handbook](https://handbook.datalad.org/) for a comprehensive tutorial on data version control.
- The [Software Carpentry](https://software-carpentry.org/lessons/) lessons for foundational skills in the Unix shell, Git, and Python.
* The [Software Carpentry](https://software-carpentry.org/lessons/) lessons for foundational skills in the Unix shell, Git, and Python.

Revision as of 14:42, 9 April 2026

The goal of this guide is to help you set up every new project so that it is well-organized from day one, fully reproducible, and ready to be made public at publication. We follow the philosophy of the [Good Research Code Handbook](https://goodresearch.dev/) by Patrick Mineault, but extend it to cover the full scope of a research project: experimental data, analysis code, lab notes, figures, posters, and papers.

The core principle is simple: **one project = one paper = one repository**. Everything related to a project lives together, is version-controlled, and is structured so that anyone (including future you) can understand and reproduce the work.


Prerequisites: Learn the Tools

Before you begin, make sure you have a working knowledge of the following. If you don't, work through the linked tutorials first.

    • Git and GitHub** are the foundation of everything below. Git tracks changes to your files over time; GitHub hosts your repository online and enables collaboration.
    • The command line** is needed for most of the setup steps below. You don't need to be an expert, but you should be comfortable navigating directories and running commands.
    • Python packaging and environments** are essential for making your code portable and reproducible.


Step 1: Name Your Project and Create the Repository

Pick a short, descriptive name. This name will be used for the folder, the GitHub repository, and the installable Python package, so keep it lowercase with no spaces (use hyphens or underscores if needed). For example: `pupil-dynamics`, `pursuit-acceleration`, `meg-connectivity`.

Go to [GitHub](https://github.com) and create a new repository:

  • Initialize it with a README and a `.gitignore` (select the Python template).
  • Choose a licence. For open science, we recommend **MIT** for code-heavy projects or **CC-BY 4.0** for data/content-heavy projects.
  • Clone the repository to your local machine:

```bash cd ~/projects git clone https://github.com/YOUR-USERNAME/project-name.git cd project-name ```


Step 2: Set Up the Directory Structure

Create the following folder structure inside your repository:

``` project-name/ ├── code/ # reusable Python modules (your installable package) │ └── __init__.py ├── data/ │ ├── raw/ # raw, untouched data (never modify these files) │ └── processed/ # cleaned or transformed data ├── docs/ │ ├── labnotebook/ # electronic lab notes (Markdown files, dated) │ └── protocols/ # experimental protocols and SOPs ├── results/ │ ├── figures/ # publication-quality figures │ └── intermediate/ # checkpoints, intermediate outputs ├── scripts/ # analysis scripts and notebooks ├── outputs/ │ ├── papers/ # manuscript drafts (LaTeX or Markdown source) │ └── posters/ # poster source files ├── tests/ # unit tests for your code ├── environment.yml # conda environment specification ├── setup.py # makes your code pip-installable └── README.md # project overview and instructions ```

You can create this in one command:

```bash mkdir -p code data/{raw,processed} docs/{labnotebook,protocols} \

        results/{figures,intermediate} scripts outputs/{papers,posters} tests

touch code/__init__.py ```

A few things to note about this structure. The `code/` directory holds reusable Python modules that you import (equivalent to `src/` in the Good Research Code Handbook). The `scripts/` directory holds analysis scripts and Jupyter notebooks that call functions from `code/`. The `data/raw/` directory is sacred: raw data go in, but nothing ever comes back out modified. All transformations produce new files in `data/processed/`.


Step 3: Set Up a Virtual Environment

Every project gets its own conda environment. This ensures that your dependencies are documented and that the project can be reproduced on any machine.

```bash conda create --name project-name python=3.11 conda activate project-name conda install numpy scipy matplotlib pandas seaborn jupyter ```

Export the environment specification and commit it:

```bash conda env export > environment.yml git add environment.yml git commit -m "Add conda environment specification" ```

Keep `environment.yml` up to date as you add packages. Anyone can recreate your environment with:

```bash conda env create --file environment.yml ```

For more details on managing conda environments (including mixing pip and conda packages), see the [Good Research Code Handbook: Setup](https://goodresearch.dev/setup).


Step 4: Make Your Code Pip-Installable

This step avoids the mess of `sys.path` hacks and makes your modules importable from anywhere in the project. Create a minimal `setup.py` in the project root:

```python from setuptools import find_packages, setup

setup(

   name='project-name',
   packages=find_packages(),

) ```

Then install your package in editable mode:

```bash pip install -e . ```

Now you can `import code.my_module` from any script or notebook in the project without worrying about paths. If you change the code, the changes are picked up automatically. See the [Good Research Code Handbook: Setup](https://goodresearch.dev/setup#install-a-project-package) for a more detailed walkthrough.


Step 5: Configure `.gitignore` and Data Tracking

Not everything belongs in Git. Add the following to your `.gitignore`:

```

  1. Data (tracked separately, see below)

data/ results/intermediate/

  1. Python
  • .egg-info/

__pycache__/

  • .pyc

.ipynb_checkpoints/

  1. OS files

.DS_Store Thumbs.db

  1. Environment

.env ```

    • For large or sensitive data**, Git is not the right tool. You have two main options depending on your situation:
  • [DataLad](https://www.datalad.org/) (recommended for neuroscience): built on Git and git-annex, it version-controls arbitrarily large files and integrates with BIDS and OpenNeuro. See the [DataLad Handbook](https://handbook.datalad.org/) for a thorough tutorial.
  • [DVC (Data Version Control)](https://dvc.org/): a lightweight alternative that works well with remote storage backends (S3, Google Drive, etc.).

If your data are small enough to share directly (< 50 MB of text-based files, e.g., behavioural CSVs), you can keep them in the Git repository and remove `data/` from `.gitignore`.


Step 6: Organize Your Data

Follow community standards for data organization wherever possible. For neuroimaging data (MRI, EEG, MEG), use the [Brain Imaging Data Structure (BIDS)](https://bids.neuroimaging.io/). For behavioural and psychophysics data, adopt a consistent naming convention:

``` data/raw/sub-01/ses-01/sub-01_ses-01_task-pursuit_beh.csv data/raw/sub-01/ses-01/sub-01_ses-01_task-pursuit_eyetrack.edf ```

The key principles are: use subject and session identifiers consistently, include the task name, separate metadata from data, and never modify raw files. Write a `data/README.md` that documents the naming convention, variable definitions, and any relevant acquisition parameters.


Step 7: Keep an Electronic Lab Notebook

Use the `docs/labnotebook/` directory for dated Markdown entries. A simple naming convention works well:

``` docs/labnotebook/2026-03-30_pilot-data-collection.md docs/labnotebook/2026-04-02_initial-analysis.md ```

Each entry should briefly note what you did, what you observed, any decisions you made, and links to relevant scripts or results. These notes are version-controlled along with everything else and provide a timestamped record of the project's evolution.

You can also use tools like [Obsidian](https://obsidian.md/) or [Logseq](https://logseq.com/) for richer note-taking and link them to your repository.


Step 8: Write a Good README

Your `README.md` is the front door to the project. Write it early and update it as the project evolves. It should include:

  • **Project title and one-paragraph summary** of the scientific question.
  • **How to set up the environment** (`conda env create --file environment.yml`).
  • **How to reproduce the results** (which scripts to run, in what order).
  • **Directory structure** (copy and paste the tree from Step 2 and annotate it).
  • **Data availability** (where the data live if not in the repository).
  • **Authors and contact information**.
  • **Licence**.


Step 9: Commit Early, Commit Often

A good rule of thumb is to commit every meaningful unit of work: a new analysis function, a cleaned dataset, a draft of a figure. Each commit should have a short, informative message. Aim for several commits per day when you are actively working.

```bash git add scripts/01_preprocess.py git commit -m "Add preprocessing pipeline for eye-tracking data" git push ```

If you are not comfortable with the Git command line, the [Git panel in VS Code](https://code.visualstudio.com/docs/sourcecontrol/overview) is an excellent GUI alternative.


Step 10: Prepare for Publication from Day One

The reason we set all of this up at the start is so that sharing is effortless when the paper is ready. At publication time, you should be able to:

1. **Make the GitHub repository public** (or archive it on [Zenodo](https://zenodo.org/) for a citable DOI). 2. **Deposit the data** on a public repository such as [OpenNeuro](https://openneuro.org/) (for BIDS neuroimaging data), [OSF](https://osf.io/), [Figshare](https://figshare.com/), or [Dryad](https://datadryad.org/). 3. **Link everything in the paper**: point readers to the code repository, the data repository, and specify the exact environment needed to reproduce the results.

If you have followed this guide, all three steps should take minutes rather than days.


Quick-Reference Checklist

When starting a new project, work through the following:

  • [ ] Create a GitHub repository with README, licence, and `.gitignore`
  • [ ] Clone it locally and set up the directory structure
  • [ ] Create and export a conda environment
  • [ ] Create `setup.py` and `pip install -e .`
  • [ ] Configure `.gitignore` (and DataLad/DVC if needed for large data)
  • [ ] Write a `data/README.md` documenting your data conventions
  • [ ] Start your lab notebook with a first entry
  • [ ] Write a draft `README.md` with setup and reproduction instructions
  • [ ] Make your first commit and push


Further Reading