5. Collaboration, Git and Github
Collaboration
Science is based on collaboration and in our ever more interconnected world this collaboration happens around the globe in several time zones. Managing these collaborations can be time consuming, especially if it involves creating a manuscript or software together. The most straight-forward way of doing this is to send your collaborator a copy of your manuscript and ask for feedback. A few hours/days later she will look at it and add comments or correct paragraphs. The same holds true for software. After she gets back to you, you will now have to incorporate those changes into your local copy. But in the time it took your collaborator to get back to you, you probably already made changes to the manuscript. Such a workflow creates inefficiencies and you will spend a lot of time managing different edits, especially when the number of collaborators increases. One possible way out of this is to use an online platform like Google Docs to enable live and real-time collaboration, but that is infeasible for source code.
The open-source community has developed several tools for handling these problems and one particular one came out of the development of the Linux kernel and enables hundreds of developers to simultaneously develop one piece of source code.
Git model
Git is a decentralized version control system it is based on the notion of repositories
. A repository is a directory containing a .git
directory that contains the meta data. Each collaborator has a local copy of the repository and the state of the repository can be synchronized via remotes
.
Files and changes to these files are added as commits
to the meta data and each commit is identified by a unique id (sha hash). Instead of storing each file at each commit git is storing the current status and the differences to the prior versions.
It also solves the problem of keeping track of prior versions and when changes where introduced. As an example this set of lecture notes is kept as a git repository on Github.
Resources
- Git Book
- Git Documentation
- Github Help
- Git the simple guide
- Sourcetree
- Github GUI
Basic commands
git init
Initializes a directory as a git repository by creating the.git
folder.git add
Adds a file or changes to a file to thestaging
area.git commit
Creates a commit from the changes in thestaging
area.git status
Gives a status report for the current repository.git push
Pushes new commits to the remote server.git pull
Pulls new commits from the remote server.git clone
Clones an already existing repository from a remote.
Advanced commands
To enable parallel work git uses a notion of branches. A branch is a set of commits that follow each other and that can be merged with other branches at a later point and potential conflicts can be resolved at the point of a merge. Each repository has a default branch called master.
git branch
Create a new branch from the current branch.git checkout
Switches current branches.git merge
Merges a branch into the current branch.git rebase
Rebases the current branch onto another branch.git fetch
Updates the meta data and fetches the current remote status without actually updating the local branches.
Github
In order to effectively collaborate you will need some online server that provides a remote to store the repository. There are several such services but the two primary one are Bitbucket
and Github
. If you create software as part of your research for OIST you are encouraged to publish it under an open-source license on Github as part of the Github organization github.com/OIST
.
Other features of Github are issue management and a simplified workflow with forks
and pull requests
. Forking a repository means to create a copy of that repository in your account. You then can clone that repository and push to your copy. After applying changes you can do a pull request
asking the original owner to adopt these changes. You can also use pull request to discuss merges of branches.
SSH keys
In order to authenticate with Github it is useful to have a public-private key pair. The public part can be handed out to everybody and the private part needs to be kept secure.
If you encrypt a message with your private key, somebody holding your public key can decrypt the message. This is a way of authentification. Vice versa somebody can use your public key to encrypt a message and only you can decrypt the message. This allows for secure communication after exchanging public keys.
Git configuration
git config --global user.name "Your Name"
git config --global user.email "Your Email address"
git config --global github.user "your github user"
git config --global push.default simple
Play around
Open up Try Git and afterwards Learn Git Branching to play around and get a feel for things.
Git for your lab notebook
Proper documentation of your work is a major challenge. One solution to this is an online notebook. Here and here are great examples made using git.
Homework
The homework is due on October 22 2015 at 13:00pm (eg. noon).
Create a fork of the repository MikheyevLab/Exercise5
and create a series of commit to solve the exercises below. For each part of the exercise there should be at least one commit, preferably more than one.
Each commit should explain your thought process (and resources used) in the commit messages and the code to solve the exercises. If you want you can collaborate (atmost 2 people) on the homework, but each of you should make individual commits so that the contributions of everybody are clear.
To hand in send Valentin a message containing the URL to the git repository and a who is part of the team.
- Create a Github account and configure git.
- Fork the repository
MikheyevLab/Exercise5
. - Implement a data-structure that realizes an undirected graph.
- Read up on directed and undirected graphs.
- Read up on adjacency lists and matrices and decide on which to use.
- Implement a function that creates a fully connected graph.
- Implement a function that creates a randomly connected graph.
- Implement the small-world initialization algorithm.
- Implement an algorithm that determines if your graph is cycle free.
- Extend your graph to store values along the edges (weights) and randomly assign them.
- Implement an algorithm that tries to greedily find the cheapest route from A->B (interpreting weights as costs).