Twitter github

Apache and Politics Over Code?

Mikeal Rogers just wrote a fascinating blog post, Apache considered harmful.

I have a lot of respect for the Apache community but I’m glad that someone is calling them out finally. The Apache community likes to pride itself on community over code but what has been happening recently regarding the move to a distributed version control system is either pure politicking or negligence in my opinion.

You would have to be under a rock if you haven’t noticed the change both distributed version control and in particular Github has brought to the open source world. Can you name any other major open source project (besides Apache) that is not on some form of distributed version control or has a concrete plan to move? No, I can’t at least off the top of my head. This is because the times have changed, open source projects are more mainstream now and they especially favor distributed forges like Github.

Let’s try to have some fun with statistics. From a recent presentation by Stephen O’Grady from Redmonk, Github’s growth is almost unbelievable…

I’m confident if he updated the excellent presentation again, it would further show the distance between Github and the other forges. Heck, even throw in Bitbucket (Hg and Git now) and Launchpad (Bzr) to see how fast they are growing compared to the others. Another statistic we can look at to further spot this trend is package statistics from Debian…

That’s impressive growth for Git but still shows that SVN is doing OK (poor darcs). It would be great to see more download statistics but I can’t think of other easy sources at the moment. We can also analyze search volume via Google Trends to see what people are searching for over time…

Clearly git (including github) and mercurial are trending upwards. I mean, one could argue that this is because git and mercurial are harder to learn so people are searching more for it, but I doubt that’s the complete story. I didn’t include cvs (famous U.S. pharmacy) or bazaar (ambiguous) because they are searched for in other contexts and I don’t know how to tweak google trends. While doing these searches I wanted to test another hypothesis of mine. From personal experience, I believe that in the corporate world, distributed version control adoption is lagging. The main reason for this line of thinking is that corporations are obviously slower than open source communities in adopting new technologies. To test this theory, I used Indeed to perform a search and see how things are going…

From the looks of it, CVS/SVN are still the dominant players with Clearcase hilariously staying somewhat constant over time. However, I’m sure this graph is going to look quite different in a couple of years as the tools around distributed version control systems mature. I also believe developers will start asking for a form of distributed version control while experiencing it in the wild (see git-svn). I was curious to see if LinkedIn had anything to help shed some more insight of what is going on in the software industry and found their LinkedIn Skills application. I couldn’t find a good way to group and compare relative skills but I found some interesting information. In terms of relative growth, git seems to be trending well…

In terms of skill size, svn is still doing well.

I was curious to see how CVS was doing also…

CVS is experiencing negative skill growth and then I noticed CMVC in the trends which reminded me of bad times and I knew it was time to stop digging for statistics.

Why do I care? Two main reasons. The first is simple and deals with my day job of facilitating open source efforts at Twitter. If you’re going to open source a new project, the fact that you simply have to use SVN at Apache is a huge detterent from even going that route. It would be easier to simply host the code at Github or a similar forge and take what lessons you need from The Apache Way. There’s a lot of tools available to help you with the infrastructure of your project (i.e., you can use Cloudbees or Travis CI to help you with continuous integration). The point here is that continuing to use SVN is not going to help Apache grow. When is the last time you heard a developer all excited about using SVN?

Another reason is that I have personal experience with this particular issue as I spent the last couple years helping the Eclipse Foundation transition towards git. It’s a large transition because there’s roughly 1000 committers and over 200 projects using a mix of CVS and SVN. On top of that, it took convincing the EGit/JGit projects to move to eclipse.org and a couple board meetings and votes to make that happen. Furthermore, the git tooling had to get up to snuff before the majority of eclipse.org projects started to adopt git since the previous generation of SCM tooling (e.g., CVS) spoiled Eclipse developers. All I’m saying is that it took a lot of work to start the transition and the eclipse community hasn’t even fully completed it yet. Just ask the PostgreSQL community how quick it was moving to Git. The key point here is that you have to start the transition soon as it’s going to take awhile for you to implement the move (especially since Apache hosts a lot of projects).

In the end, I’m a huge fan of the Apache Foundation and The Apache Way, as a lot of us have benefited and learned from Apache in some fashion. I just hope the Apache community learns to evolve or they will become less relevant in the new open source world order of distributed version control systems and the forges behind them. I take this problem to heart because I believe The Eclipse Foundation faces some of the same issues and we’re doing our best to mitigate them.