Mind Passthrough

Nicolas Ochem's mumblings on virtualization and network policy

Branch-aware Git Submodules

Large software projects often build their release from many repositories. It it tempting to set up a super-repository referencing all your shipping code through git submodules. Since each commit of your superproject contains an unambiguous reference to a particular commit of every submodule, you can tag your superproject to define a release or nightly build.

All it takes is a single command to retrieve all the source code for a particular release.

1
2
# checking out all source code for 3.11 release
git clone --recursive superproject -b 3.11

It is very useful for archival purposes or code escrow. In this post, we explore how to set that up.

Setting up submodules

There used to be no straightforward way to update your submodules. You would have to extract the SHA checksum of the desired commit of every submodule, and point your superproject to each one of them. git submodule update would check out the submodule code. Add branching logic to the mix, and you ended up with fairly complex code.

Since git 1.8.2, it has become much easier. When defining a submodule, you can now specify which branch it is supposed to track.

Say you are building your superproject from components A and B. Your development builds are built from branch master of component A and branch dev of component B. The following commands will set up the superproject:

1
2
3
git init
git submodule --branch dev add <git repo of component A>
git submodule --branch master add <git repo of component B>

Git will store the relationship with the sumbodules in the .gitmodule file.

1
2
3
4
5
6
7
8
9
# content of .gitmodules files on master branch
[submodule "component_A"]
    path = component_A
    url = <git repo of component A>
    branch = dev
[submodule "component_B"]
    path = component_B
    url = <git repo of component B>
    branch = master

Then all it takes to fetch the last commits from the components is:

1
git submodule update --remote

You can issue git submodule status to verify that the commit SHA1 hashes match the latest commits of your component repositories:

1
2
3
nochem@bonk:/tmp/submodules/superproject$ git submodule status
 9665f1cd09faa63c6e3211712a805c49bf99c7c5 component_A (heads/dev)
 323db44f229e794850fec8afb5e8964d813d9a30 component_B (heads/master)

Then your automated build system may just automatically tag nightlies for all your components every night:

1
2
3
4
5
git checkout master
git submodule update --remote
# do the build.
git commit -a -m "Nightly build 345"
git tag master-nightly-345

Of course, this does not dispense you from tagging your component repositories individually.

Adding a branch

If your release branch is named “3.11” on both components, you may create a branch 3.11 on your superproject, delete the submodules (with the git submodule deinit command available since release 1.8.3 of git), then recreate them again giving the correct --branch option.

Or you may just edit your .gitmodules file and check it in to the repo.

1
2
3
4
5
6
7
8
9
# content of manually edited .gitmodules files on 3.11 release branch
[submodule "component_A"]
    path = component_A
    url = <git repo of component A>
    branch = 3.11
[submodule "component_B"]
    path = component_B
    url = <git repo of component B>
    branch = 3.11

Then, when on branch 3.11 of your superproject, git submodule update --remote will fetch the latest release branch content from all submodules.

1
2
3
4
5
6
nochem@bonk:/tmp/submodules/superproject$ git submodule update --remote
Submodule path 'component_A': checked out '56d62717ac6f5fb4e67daa331c5ef566588cec4e'
Submodule path 'component_B': checked out '312df465c70cb81ffaf6dd6f2d505df44c8db45f'
nochem@bonk:/tmp/submodules/superproject$ git submodule status
 56d62717ac6f5fb4e67daa331c5ef566588cec4e component_A (heads/3.11)
 312df465c70cb81ffaf6dd6f2d505df44c8db45f component_B (heads/3.11)

You may now tag and commit again:

1
2
git commit -a -m "release 3.11 nightly build 5"
git tag "3.11-nightly-5"

You have now set a multi-branch superproject tracking all branches of all your components. Isn’t it nice ?

Thus said, a word of warning is necessary. Git submodules are recently receiving a lot of improvements, but they are no panacea. If your component repositories are very dependent on one another, and developers are likely to commit to several repositories, then you may be better off having one big repository. This model works well when your different components are worked on by different teams, and you are looking for an easy way to check out or tag all the code.

All the code used in this post is also available as a gist.