h:80

Organize your research and teaching material with Git

Waldir Leรดncio Netto

Research Software Engineer
Oslo Centre for Biostatistics and Epidemiology (OCBE)

https://github.com/ocbe-uio
https://github.com/wleoncio

Course plan (subject to change)

Day
Date
Focus
1 02 Nov
10:00-11:30
Git basics+ (status, add, commit, log, diff, reset, branch)
For all users, no matter how or where they work
2 09 Nov
10:00-11:30
Git + GitHub on RStudio (Stata/others get limited support)
Changing Git history (amend, squash, fixup)
3 16 Nov
10:00-11:30
Solely maintaining a GitHub repository
(fetch, pull, push, issues, merge, rebase)
4 23 Nov
10:00-11:30
Collaborating on GitHub
(pull requests, merge conflicts, blame, fork)

Class rules (day 1)

Course material

๐ŸŒ Available here or on QR
๐Ÿ“œ Licensed under Creative Commons Attribution-ShareAlike

Workflow

โœ‹ Questions/comment at any time (in-person or Slido #4152775)
๐Ÿ’ป Software requirement: Git https://git-scm.com/download/

My personal history with Git

As a statistician

  • Started using Git (for R packages) in 2013
  • Used it for every single project at work and home (software and otherwise)
  • Just the basics until summer 2016 (start of PhD thesis)
  • 2017: started using "Git advanced" and GitHub

As a software engineer

  • Fall 2018: Started working as Research Software Engineer at UiO
  • 2019: Senior engineer at OCBE: maintaining dozens of repositories (ocbe-uio)
  • 2023: GitHub as my main work hub (wleoncio)

What Git is and isn't

Git is

  • A version-control system
  • Created in 2005 by Linus Torvalds ๐Ÿ‡ซ๐Ÿ‡ฎ (of Linux fame)
  • Maintained by Junio Hamano and others
  • Super good at handling pure-text files
    • R, Python, MATLAB, Stata files (data analysis scripts, packages, etc.)
    • Whatever you use to write documents (especially and Markdown)
      • notes
      • papers
      • presentations

Git is not

  • Just for software developers
  • A good place to dump
    • Large files (e.g. data), unless you use Git LFS
    • Files generated from other files (e.g. .pdf from .tex)
  • GitHub
    • Distributed workflows (SSOT) are a Git feature
    • but a secondary one. Git's main job is version control
    • There are other places to keep your SSOT

The elevator pitch

  • Curated work history
  • Parallel versions of the same files
    • Test new ideas
    • Prepe for publication
  • It's software-agnostic and highly-configurable
    • Offline/online
    • Cross-platform
    • Textual/graphical
    • Beginner/advanced

Installing Git (last chance, for real this time)

  • General instructions: https://git-scm.com/download/
  • Linux
    • It usually comes pre-installed (verify by running git --version)
    • Otherwise, use your distro's package manager (apt, dnf, pacman...)
  • macOS
    • Open a terminal and type git --version
    • Plan B: install Xcode Command Line Tools
  • Windows

Do not fear your keyboard!

  • Git is primarily text-based software
  • Just like R, Stata...

๐Ÿ˜ƒ Perfect if you're looking for speed, automation and reproducibility
๐Ÿ˜ก Not great if you like to use your mouse a lot

...but even if you identify with latter category, you may love using Git!

Some graphical interfaces (GUI) to Git

Git lingo, simplified

  1. Repo(sitory): A collection of all versions of your files
    • Locally, a hidden .git folder in your working directory (WD)
    • May also be hosted online (e.g. GitHub, GitLab, Bitbucket)
  2. Commit: One particular version/snapshot of your WD
    • A repo is a collection of commits
    • Basic workflow behavior:
      • Save (CtrlS) as much as you want between commits
      • Saves are ignored, only commits matter (unlike Dropbox)
  3. Hash: a 40-character unique alfanumeric identifier of a commit
    • a4508a3d857eg282aced33ba75d18ede34fde99f
  4. Pointer: a word reference to a commit (e.g. HEAD, main, origin/main)

Where does Git store your files?

  1. Working directory (i.e., not Git): the files as you see and work with in your file explorer
  2. Staging area: a list of files that are ready to commit
  3. Repository: all previously commited version of your files

Basic Git Commands

(yes, we're finally starting for good ๐ŸŽ‰)

Initializing a repository

git init # within working directory

Configuring your user info

git config --local user.name "$name"  # or --global for the whole system
git config --local user.email "$email" # GitHub user? use your GH e-mail

Tip: Peek at your config file (git config --list)
Top tip: Add some aliases

Doing some actual work

  1. First do some science

  1. Then commitโ€•i.e. register on Gitโ€•a version of your science
git status
git add $file  # Pro tip: --patch to hand-pick changes
git commit -m "$message"
git log        # Pro tip: try these options: --oneline --graph --all
  1. Go make more science

  1. See what you have done
git status  # Pro tip: wrap with "watch" to auto-update
git diff $file
git commit --all --message "$message"  # Pro tip: fix an oopsie with --amend
git show HEAD:$file

...and those are the Git basics

Everyday registration of your work

  1. Work on files
  2. Add them to the staging area (git add)
  3. Commit them to the repo (git commit)

Eventual consultation of a previous version

  1. Check out the Git tree (git log)
  2. Compare versions of your files (git diff, git show)

Intermediate Git flash-course

Undoing changes

git reset $file        # unstages $file
git reset $hash        # undo commits after $hash, but keep the changes
git reset --hard $hash # go nuclear (i.e., lose changes after commit)

Peeking at a file's change history

git log -p $file
git show $hash:$file # Pro tips: pipe (|) to "less"; redirect (>) to file

Tagging

git tag $tag_name

Git's greatest strength

(in my opinion)

Creating parallel versions of your files

Git is like a tree

  • Tree: repo
  • Leaves: files
  • Branches: different paths of development

In other words: you can have different versions of the same file on different branches!

Git is not like a tree

  • Expect your branches to merge back with the trunk(s)
    1. ๐ŸŸ  ๐ŸŸข Files start the same
    2. ๐ŸŸ  ๐Ÿ”ต โšช Differ after some commits
    3. ๐ŸŸ  ๐ŸŸข merge back with trunk(s)
  • In this example:
    • Long-lived (trunk) branches: ๐ŸŸ  ๐ŸŸข
    • Short-lived (feature) branches: โšช ๐Ÿ”ต ๐ŸŸก

Create a branch

git branch $branch_name

Check outโ€•i.e., switch toโ€•a branch

git checkout $branch_name

Merging back with main branch

git checkout main
git merge $branch_name
git branch -d $branch_name # recommended cleanup

Ignoring files

  • Datasets
  • References/Literature
  • Files generated from other files
  • Files that will never change

Create a file called .gitignore and chuck all that in it.

Example:

When Git doesn't let you switch branches

a.k.a. "soft commit"

git stash  # 1. Sweep a temporary version under the rug
git pop    # 2. Retrieve it when you switch back

Preview of day 3: working with remote repos

git clone # Create a local copy of the remote repo
git fetch # Sync the remote pointer
git pull  # Sync the local pointer with the remote
git push  # Sync the remote pointer with the local

Supplementary material

center h:300

Homework

  1. Create an account on GitHub: https://github.com/signup
  2. Generate an SSH key: https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent
  3. Add the public key to your GitHub account: https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account
  4. Test your SSH connection: https://docs.github.com/en/authentication/connecting-to-github-with-ssh/testing-your-ssh-connection

Troubleshooting

If you know the creator, you know he has a bit of a peculiar sense of humor. But now, seriously...