Version Control conundrum

June 4th, 2011, 12:27 am PDT by Greg

As most of you know, the School’s new course management system is my baby. It keeps track of many things, but what I care about right now is (1) who is in a course, and (2) what groups have been formed for assignments/projects/whatever.

Given those things, I have had this idea: It would make perfectly good sense for each of those things (every student in a course; every group in a course) to have a version control repository automatically created for them. The instructor and TAs would also have access, but wouldn’t have to set anything up. Students could use the repositories even in courses where the instructor doesn’t know what technology is.

I have used Subversion repositories for the project groups in CMPT 470 for years. The benefits from my point of view:

  1. Groups can collaborate in that way that version control systems allow.
  2. Students can work on code (even individually) in multiple locations and with versions kept.
  3. All of their code is safely backed-up on a server that we kind of trust.
  4. I can review what members of the group contributed what code.
  5. It’s a nice and easy way to submit code: just give me the SVN URL.

When contemplating technologies to implement my scheme, I went first to GIT (or possibly some other distributed version control system, since they’re all the rage). GIT also has a pile of nice management tools like gitolite that make creating thousands of repositories surprisingly easy.

But while experimenting, I realized that GIT inherently trusted the user-provided information about who they are. If I claim to be “Barack Obama <president @whitehouse.gov>” in my commits, then GIT lets me push those commits just fine, no matter who I have authenticated as at the central server. So, I pretty much lose benefit (4) in the worst cases (which are the cases I’m usually concerned with), which is pretty much a deal-breaker by itself.

The “distributed” nature of any DVCS gets me this problem one way or another—anybody could push the whole group’s work since they could be working for weeks without touching the central server. And having made that realization, I have to admit that (3) also disappears: they don’t have to push to the server very often, so a crash on their end could lose a lot of work.

Finally, knowing students the way I do, (5) is gone too. I’d give a lot to not have this conversation five times a semester: “I got a zero.” “You didn’t submit any code.” “Yes, I committed it.” “You committed it, but did you push it to the server?” “Yes, I pushed it.” “You typed the command ‘git push’?” “No, I use ‘git commit’. That puts the code on the server.” “No it doesn’t. You didn’t put any code on the server where I can get it.” “Yes I did… I committed it.”

Also, it’s my understanding that it’s not possible to give a URL to a subtree of a GIT repository: the only URL is to the project itself. That makes submitting with GIT much harder.

So, I’m left with this: distributed version control is at least as good for developers, but it’s very bad for instructors.

According to Wikipedia’s comparison of revision control software, the only open source, “actively-developed”, “client-server” VCS is Subversion. So it looks like I’m back to the totally-uncool and old-fashioned SVN?

Does anybody want to refute any of that?