Among best practices in science, aiming for reproducibility is primarily important because it ensures methodological rigor and the consistency of findings(1).
I am convinced that reproducibility is best achieved through automation of certain processes and adherence to (opinionated) standards(2).
What follows is a personal project workflow implementing various external tools in an R environment to obtain and maintain reproducibility.

This guide is aimed at intermediate R users and was developed with Rstudio in mind. Much wider coverage and further resources on reproducible R workflows can be found at Prof. Harrell’s website.

1 Install and load ProjectTemplate

1.1 Install ProjectTemplate

ProjectTemplate is an R package allowing for automated organization of R projects. It works by creating an opinionated project structure, pre-loading all packages and data sets used, and pre-processing data where needed.

You can install the package by running

install.packages('ProjectTemplate')

1.2 Load ProjectTemplate

Load the package and create a new minimal project structure:

require('ProjectTemplate')
create.project(project.name = 'projectname',
               template = 'minimal')

Note

create.project() is the function that takes care of creating the project structure. It needs a project.name (or will create the project in a “new-project” folder). Optionally, the user can specify a project template (currently, options 'full' and 'minimal' are available; custom templates are also supported).

1.3 Choose a useful project name

Be mindful of recommendations for naming things.

Note

Generally, you want your project (and any files and folders you’re working with) to be named consistently, so that any files (loaded or generated) are immediately recognizable. File names (especially for plots, tables, and exported data sets) should often include a date. Names of R scripts (and Quarto reports) should be self-explaining.

2 Open the project in RStudio

Open Rstudio
Go to File > Open Project…

3 Using your structured project

3.1 Load your project

You can load the whole project (data, R scripts) with:

load.project('projectname')

data sets in data will be loaded as data.frames.
- ProjectTemplate can load a bunch of different file formats, including compressed and uncompressed CSV, Excel and RDS.

3.2 Getting to know the project folders

A minimal project structure includes the following folders. Each folder comes with an explanatory README file, but they will be summarized here, in reasonable order:

data/ contains the project’s data sets.
- Raw data sets should be backed up in multiple locations outside the project.
- Raw data should never be edited or overwritten. This is because you can never be sure all edits are respectful of the original data.
munge/ includes scripts for pre-processing the data (e.g. adding custom columns, re-assigning variable classes, etc.).
- Pre-processed data should be generated in the cache folder.
cache/ includes data sets generated through pre-processing steps.
- In this folder, transformed data are placed.
- When a data set is available in both cache/ and data/, load.project() will only load the cached version.
src/ contains R scripts for data analysis.
- each script should start with the same line:
```
library('ProjectTemplate')
load.project()
```
- any code that’s shared between different R scripts should be placed in the munge directory instead.

4 Version control and collaboration

4.1 Initialize Git repository

After creating your project structure, initialize version control immediately:

# In the terminal or Git Bash
git init
git add .
git commit -m "Initial project structure"

Note

Version control is non-negotiable for reproducible research. It tracks changes, enables collaboration, and provides a safety net for your work.

4.2 Create a `.gitignore` file

ProjectTemplate creates a basic .gitignore, but you should expand it:

# Data files too large for Git
data/*.csv
data/*.xlsx
cache/*

# R temporary files
.Rhistory
.RData
.Rproj.user/

# OS files
.DS_Store
Thumbs.db

5 Data management best practices

5.1 Raw data preservation

Your data/ folder should follow these principles:

Read-only access: Never modify files in data/
Documentation: Include a data/README.md describing each dataset
Backup strategy: Maintain copies in at least two other locations

# Example data documentation structure
# data/README.md
# Dataset: patient_outcomes.csv
# Source: Clinical trial NCT12345678
# Date acquired: 2024-04-15
# Variables: 25 columns, see codebook.xlsx
# N observations: 1,250

5.2 Data preprocessing workflow

Create numbered scripts in munge/ to ensure correct execution order:

# munge/01-clean-data.R
# Remove duplicates and handle missing values
df_clean <- df_raw %>%
  distinct() %>%
  filter(!is.na(primary_outcome))

# munge/02-transform-variables.R
# Create derived variables
df_clean <- df_clean %>%
  mutate(
    age_group = cut(age, 
                    breaks = c(0, 40, 60, Inf),
                    labels = c("Young", "Middle", "Older"))
  )

6 Analysis and reporting

6.1 Organizing analysis scripts

Structure your src/ folder by analysis type:

# src/01-descriptive-statistics.R
# src/02-primary-analysis.R
# src/03-sensitivity-analysis.R
# src/04-figures-tables.R

Note

Number your scripts to indicate execution order. This helps collaborators understand your analytical flow.

6.2 Integrating Quarto documents

For reproducible reports, create a reports/ folder:

# Create reports directory
dir.create("reports")

# reports/main-analysis.qmd
# reports/supplementary-material.qmd

Your Quarto documents should source the analysis scripts:

7 Advanced reproducibility features

7.1 Package management with `renv`

Ensure consistent package versions across environments:

# Initialize renv for the project
install.packages("renv")
renv::init()

# Snapshot current package versions
renv::snapshot()

7.2 Automated testing

Create a tests/ folder for unit tests:

# tests/test-data-integrity.R
library(testthat)

test_that("No missing values in key variables", {
  load.project()
  expect_true(all(!is.na(df_clean$primary_outcome)))
  expect_true(all(!is.na(df_clean$treatment_group)))
})

7.3 Configuration management

Modify config/global.dcf for project-specific settings:

# config/global.dcf
data_loading: TRUE
data_loading_header: TRUE
data_ignore: ""
cache_loading: TRUE
recursive_loading: FALSE
munging: TRUE
logging: FALSE
logging_level: INFO
load_libraries: TRUE
libraries: tidyverse, ggplot2, survival
as_factors: FALSE
tables_type: tibble
attach_internal_libraries: FALSE
cache_loaded_data: TRUE
sticky_variables: NONE

8 Best practices for collaboration

8.1 Documentation standards

Every project should include:

README.md: Project overview and setup instructions
CONTRIBUTING.md: Guidelines for collaborators
CHANGELOG.md: Track major changes
requirements.txt: System dependencies

8.2 Code style consistency

Adopt a style guide (e.g., tidyverse style) and enforce it:

# Use styler package for automatic formatting
install.packages("styler")
styler::style_dir("src/")
styler::style_dir("munge/")

9 Troubleshooting common issues

9.1 Memory management

For large datasets:

# Use data.table for efficient memory usage
libraries: data.table, tidyverse

# In munge scripts, clean up intermediate objects
rm(temp_df)
gc()

9.2 Path management

Always use relative paths:

# Good
read_csv("data/patient_outcomes.csv")

# Bad
read_csv("/Users/username/projects/projectname/data/patient_outcomes.csv")

Warning

Absolute paths break reproducibility across different systems. ProjectTemplate’s structure ensures relative paths work consistently.

10 Conclusion

This workflow combines ProjectTemplate’s organizational structure with modern reproducibility tools. The key principles are:

Separation of concerns: Data, preprocessing, analysis, and reporting in distinct folders
Version control: Track all changes with Git
Environment management: Use renv for package versioning
Documentation: Comprehensive documentation at every level
Automation: Let tools handle repetitive tasks

By following these practices, your R projects will be more maintainable, shareable, and—most importantly—reproducible.

10.1 Further resources

The Turing Way: Community handbook for reproducible research
R for Data Science: Modern R workflows
Happy Git with R: Version control for R users

References

The Stanford Psychology Department Open Science Community. Stanford psychology guide to doing open science 2020.

Wilson G., Bryan J., Cranston K., Kitzes J., Nederbragt L., Teal TK. Good enough practices in scientific computing. PLOS Computational Biology 2017;13(6):e1005510. Doi: 10.1371/journal.pcbi.1005510.

Reproducible R Computing

1 Install and load ProjectTemplate

1.1 Install ProjectTemplate

1.2 Load ProjectTemplate

1.3 Choose a useful project name

2 Open the project in RStudio

3 Using your structured project

3.1 Load your project

3.2 Getting to know the project folders

4 Version control and collaboration

4.1 Initialize Git repository

4.2 Create a `.gitignore` file

5 Data management best practices

5.1 Raw data preservation

5.2 Data preprocessing workflow

6 Analysis and reporting

6.1 Organizing analysis scripts

6.2 Integrating Quarto documents

7 Advanced reproducibility features

7.1 Package management with `renv`

7.2 Automated testing

7.3 Configuration management

8 Best practices for collaboration

8.1 Documentation standards

8.2 Code style consistency

9 Troubleshooting common issues

9.1 Memory management

9.2 Path management

10 Conclusion

10.1 Further resources

References

Let's Transform Your Data into Impact

1 Install and load ProjectTemplate

1.1 Install ProjectTemplate

1.2 Load ProjectTemplate

1.3 Choose a useful project name

2 Open the project in RStudio

3 Using your structured project

3.1 Load your project

3.2 Getting to know the project folders

4 Version control and collaboration

4.1 Initialize Git repository

4.2 Create a .gitignore file

5 Data management best practices

5.1 Raw data preservation

5.2 Data preprocessing workflow

6 Analysis and reporting

6.1 Organizing analysis scripts

6.2 Integrating Quarto documents

7 Advanced reproducibility features

7.1 Package management with renv

7.2 Automated testing

7.3 Configuration management

8 Best practices for collaboration

8.1 Documentation standards

8.2 Code style consistency

9 Troubleshooting common issues

9.1 Memory management

9.2 Path management

10 Conclusion

10.1 Further resources

References

Let's Transform Your Data into Impact

4.2 Create a `.gitignore` file

7.1 Package management with `renv`