Sunday, October 6, 2013

A Quick Look at Git Submodules

I'm a big Git freak, as anyone who knows me knows.  Still, parts of Git scare me.  I only just got comfortable with rebasing (really worth learning, by the way) and submodules had been lurking on my "must learn someday" stack for a while.

Tonight, I finally got around to playing with them, because I hit a point where not using them would get ugly.  I'm working on a demo project for a a talk that involves making a project that should (A) be part of a website, (B) should be reused by many websites.  So I want this project to be a subfolder of my website, but I also want to be able to not duplicate repositories.  Looks like a logical use of Git submodules,which I knew allowed having a remote repository appear as part of another repository.  Here's some Hello World testing (caveat lector!) which should give a feel for how these work.

So, first I created a directory c:\git-testing, and created two repos inside it, "repo1" and "repo2". I added a couple of files to each and made an initial commit for both. Now I decided to make repo2 a submodule of repo1.

>cd repo1

>git submodule add ../repo2

This creates a clone of repo2 inside of repo1. In Windows Explorer, you will see a directory "repo2" as a subdirectory of "repo1", with all its contents. But git does a little magic to hide these contents from the top level repo. If you do a "git status" command, you will see this:

PS C:\git-testing\repo1> git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD ..." to unstage)
#       new file:   .gitmodules
#       new file:   repo2

Even though "repo2" is a directory, it shows up as a file! The contents of .gitmodules is fairly straightforward:
[submodule "repo2"]
        path = repo2
        url = ../repo2
It just lists the existence of the submodule, where it is found, and what it points back to. Notice there is no mention of branch or commit. This file just forms the linkage, it does not manage the state of the submodule.

You can add these two "files" and commit them, so that both repo1 and repo1/repo2 are up to date with no uncommitted work.

Now lets add some changes to the submodule. When you modify the submodule, the changes show up in the parent directory like this:

# On branch master
# Changes not staged for commit:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#   (commit or discard the untracked or modified content in submodules)
#       modified:   repo2 (modified content)
no changes added to commit (use "git add" and/or "git commit -a")

After you navigate into repo2 and add and commit changes, repo1 will look like this:
PS C:\git-testing\repo1> git status
# On branch master
# Changes not staged for commit:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#       modified:   repo2 (new commits)
no changes added to commit (use "git add" and/or "git commit -a")

And how is the "file" repo2 modified? Let's use git diff to see what's changed:
PS C:\git-testing\repo1> git diff
diff --git a/repo2 b/repo2
index 83df4d9..7e1d665 160000
--- a/repo2
+++ b/repo2
@@ -1 +1 @@
-Subproject commit 83df4d9765ebf9a292cc000833fd730c284e3ff7
+Subproject commit 7e1d665ccbcb7fc1ed28d25427c3829278e2bc56

Interesting! As far as "repo1" is concerned, repo2 is simply a file whose contents have changed form listing one commit to listing another.

And there you have it. repo1 knows it has a submodule, and knows what commit it is at. And change in commit in the submodule shows as an uncommitted change in the containing repo. It's nice that the pointer doesn't refer to a branch, just a commit. So you can play with branches in the submodule with the freedom you normally have in git. All the parent folder cares about is what commit the child is at, not whether this commmit is called "master", or whether it is up to date with the source.

And you can push your changes in the submodule up to the remote. I started with /repo1 and /repo2, and made /repo1/repo2. For /repo1/repo2, /repo1 is the "origin" repo, which I can push my changes back to. One wrinkle, however. If the original source repo is not bare, you can't push from branches with the same name, as you will get this error:

>git push origin
remote: error: refusing to update checked out branch: refs/heads/master
remote: error: By default, updating the current branch in a non-bare repository
remote: error: is denied, because it will make the index and work tree inconsistent
remote: error: with what you pushed, and will require 'git reset --hard' to match
remote: error: the work tree to HEAD.

What's up with that? Basically, git is saying if you were to push to the current checked out branch of the origin repository, you would up date the commit without updating the file system, which would mean you would have to do a "reset" to get them in synch, which would be a bit weird. If you push to a new branch, git is happy:

>git push origin master:some-new-branch

Thanks to this Stack Overflow response for that:

One neat thing you can do with submodules is rebase your own changes on top of what's happening remotely. Suppose you wanted to apply your own changes on top of JQUery. Create a submodule, make and commit your changes. Then when you want to pull in work done by the JQuery team, you can do a pull with rebase to keep your own stuff on top of theirs. Standard issue git magic, but now you are doing this in a sub-directory of another project. Sweet!

Wanna know more? This got me started: Also see the Git-Extensions manual on submodules. I haven't started working with the Git-Extensions view on this, but I suspect it will be awesome, because they represent submodules with a little Beatles-esque yellow submarine.

We all live in a Yellow Submodule...

Just perfect!

Update: How this looks in Git Extensions

So, it's the next day, and I've started using this in Git Extensions.   I've got my Instagram connector project, I'm working on, and I want it to be its own repo (because it's destined someday for Github), but I also want it to be a subproject in a demo site.  So what does this look like?

Git Extensions has a Submodule menu which allows you to link to a project (using a file path or URL) and select a branch.

So setting up the submodule is pretty straightforward. But how do you make commits in two separate repositories?  It's pretty simple, actually.  If there are uncommitted changes in the submodule, these are indicated with a submarine with a red exclamation in the commit window.  Clicking on this opens a commit window to the submodule.  

Once you are done committing to the submodule, you have to commit the directory pointer in the parent project to refer to the updated commit.  This commit window shows the special icon for the submodule. and the Diff window shows the change in referenced commits, with the commit message for each and the file change list.  The workflow is reasonably intuitive once you play around with it.

This looks like it will work out nicely for my current needs, but I did receive a caution from Sitecore MVP Kam Figy that this was not the best way to manage project dependencies, as it is somewhat breakable if you don't know what you are doing.  He recommended Nuget for dependencies with published projects.  But for the scenario in which you are managing a relationship between two of your own projects, they have a place:

1 comment:

  1. The largest issue I've run into is that it seems like sometimes this will happen:

    1. You update the revision of a submodule and push it
    2. Someone else pulls, and until they run 'git submodule update' their clone will gleefully offer to let them commit the 'modified' submodule. What modified submodule? The previous revision before your update. If they commit it, it'll rewind the submodule revision(!)

    Some folks I work with usually just commit all marked changes, so I've had to fix submodule revision updates a lot, to the point where I'd send emails specifically requesting that folks run submodule update when I updated a revision on a submodule.

    Submodules are awesome and powerful, but oh so easy to screw up. One place where SVN (with externals) beats Git badly in terms of maintainability. Probably the ONLY place SVN has a chance vs Git ;)