Version Control conundrum

June 4th, 2011, 12:27 am PDT by Greg

As most of you know, the School’s new course management system is my baby. It keeps track of many things, but what I care about right now is (1) who is in a course, and (2) what groups have been formed for assignments/projects/whatever.

Given those things, I have had this idea: It would make perfectly good sense for each of those things (every student in a course; every group in a course) to have a version control repository automatically created for them. The instructor and TAs would also have access, but wouldn’t have to set anything up. Students could use the repositories even in courses where the instructor doesn’t know what technology is.

I have used Subversion repositories for the project groups in CMPT 470 for years. The benefits from my point of view:

  1. Groups can collaborate in that way that version control systems allow.
  2. Students can work on code (even individually) in multiple locations and with versions kept.
  3. All of their code is safely backed-up on a server that we kind of trust.
  4. I can review what members of the group contributed what code.
  5. It’s a nice and easy way to submit code: just give me the SVN URL.

When contemplating technologies to implement my scheme, I went first to GIT (or possibly some other distributed version control system, since they’re all the rage). GIT also has a pile of nice management tools like gitolite that make creating thousands of repositories surprisingly easy.

But while experimenting, I realized that GIT inherently trusted the user-provided information about who they are. If I claim to be “Barack Obama <president @whitehouse.gov>” in my commits, then GIT lets me push those commits just fine, no matter who I have authenticated as at the central server. So, I pretty much lose benefit (4) in the worst cases (which are the cases I’m usually concerned with), which is pretty much a deal-breaker by itself.

The “distributed” nature of any DVCS gets me this problem one way or another—anybody could push the whole group’s work since they could be working for weeks without touching the central server. And having made that realization, I have to admit that (3) also disappears: they don’t have to push to the server very often, so a crash on their end could lose a lot of work.

Finally, knowing students the way I do, (5) is gone too. I’d give a lot to not have this conversation five times a semester: “I got a zero.” “You didn’t submit any code.” “Yes, I committed it.” “You committed it, but did you push it to the server?” “Yes, I pushed it.” “You typed the command ‘git push’?” “No, I use ‘git commit’. That puts the code on the server.” “No it doesn’t. You didn’t put any code on the server where I can get it.” “Yes I did… I committed it.”

Also, it’s my understanding that it’s not possible to give a URL to a subtree of a GIT repository: the only URL is to the project itself. That makes submitting with GIT much harder.

So, I’m left with this: distributed version control is at least as good for developers, but it’s very bad for instructors.

According to Wikipedia’s comparison of revision control software, the only open source, “actively-developed”, “client-server” VCS is Subversion. So it looks like I’m back to the totally-uncool and old-fashioned SVN?

Does anybody want to refute any of that?

4 Responses to “Version Control conundrum”

  1. Curtis Lassam Says:

    A few points I’d like to make.

    Just because Git is trendycool, doesn’t mean that it’s totally appropriate for all projects.
    It tends to fail hard whenever you have huge binary files, or need in-depth centralized control.
    It doesn’t hurt for students to develop some proficiency with SVN, which is (aside from shitty ol’ Perforce) the only version control I’ve ever seen used by a company with more than 10 people.
    SVN’s branching is a little bit lot clumsy, but aside from that it is a pretty complete version control solution.
    Git’s Windows support is absolutely abysmal. While I’d like to personally deride Windows developers for having the sheer, unmitigated gaul to try to develop on a heathen Windows box, I assume that a large percentage of students are Microsoft-bound.

    I’m not saying that Git isn’t the right tool – just that, for your situation, svn is probably okay, and it certainly doesn’t hurt.

  2. Jakub Narębski Says:

    While I can agree that in your specific case Subversion might be a better choice than distributed version control such as Git, I’d like to clarify a few issues:

    Finally, knowing students the way I do, (5) is gone too. I’d give a lot to not have this conversation five times a semester: “I got a zero.” “You didn’t submit any code.” “Yes, I committed it.” “You committed it, but did you push it to the server?” “Yes, I pushed it.” “You typed the command ‘git push’?” “No, I use ‘git commit’. That puts the code on the server.” “No it doesn’t. You didn’t put any code on the server where I can get it.” “Yes I did… I committed it.”

    On the other hand this is an advantage: student might work on their task privately, separating task into small self contained steps – commits, and publish (submit) i.e. push to central server only when ready… perhaps cleaning up history using e.g. interactive rebase or even commit amending before submission.

    Also, it’s my understanding that it’s not possible to give a URL to a subtree of a GIT repository: the only URL is to the project itself. That makes submitting with GIT much harder.

    I don’t undertsnad this issue. Note that with Git you should create separate repositories for separate projects, not bundle them all together in one single mass^W project like happens with Subversion.

    But while experimenting, I realized that GIT inherently trusted the user-provided information about who they are. If I claim to be “Barack Obama <president @whitehouse.gov>” in my commits, then GIT lets me push those commits just fine, no matter who I have authenticated as at the central server. So, I pretty much lose benefit (4) in the worst cases (which are the cases I’m usually concerned with), which is pretty much a deal-breaker by itself.

    First to be able to push you have to authenthicate to central server.

    Second, there are additional tools, like Gitolite which can be used to manage git repositories (see e.g. “Gitolite” chapter in “Pro Git” book), which can be configured to 1.) log access, 2.) refuse pushing commits where author/committer doesn’t match credentials you used to push to submission server.

    Third, you can require published work to be tagged using GPG-signed tags. Again, Gitolite or other hooks can be used to ensure that all tags are signed correctly, and that there are signed tags.

  3. Ariane Says:

    I would think that setting up SSH keys would help somewhat with making sure the commits are coming from the right people?

    Git would definitely make it cooler and a better learning experience on team projects in particular, and it does allow you to track everything, though you’d have to have someone pretty adept to manage the pushing to server and branch merging, etc. if you were going to go that far.

    Though if it’s mainly going to be for individual projects, and it seems secure, I don’t think it’s exactly a *bad* think teaching them to use SVN. Then they’ll at least get used to version control and see why it’s a best practice, and it’ll make it easier to pick up git down the road.

  4. Shawn Says:

    I would stick with SVN for CMPT 470 – like you said, it has a lot of nice advantages that you summarized. I find that there are a LOT of people who get into 470 with a very low working knowledge of version control, and I think that you have to understand the pain of centralized version control so you can fully appreciate distributed version control.

    Another option you can throw up for the more interested students, and/or the students who are already proficient with svn, is that they could use a combination of git and svn. Git for quick local branching, SVN for central repository.