I tried to measure the relative trends that go on in respective developer communities. Say for example, whether Ruby people are more active than Java developers. How do they compare with each other, by the use of different metrics and if we could deduce any important conclusions from that.
Before I begin, I would like to make an important confession: I am in no way any data scientist, nor have too much of education in statistics. I have just tried to analyze the trends using standard tools and techniques. Maybe my inspiration was:
The best way to get started in data science is to DO data science!
First, data scientists do three fundamentally different things: math, code (and engineer systems), and communicate. Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities.
- Getting Started with Data Science by Hilary Mason
I wanted to try some real world data collected by me, so I turned to the Github APIs and collected the following metrics about some random 2,600 projects for the languages Java, JavaScript, Ruby and Python:
- Watchers - People who are following the project, but not contributing code to it
- Forks – People who have created a fork and making their own contributions to the project
Both my assumptions are the ideal case. I assume the watchers are following the project, while people who fork are contributing code (though in many cases, people may not make any changes to the forks).
I first wrote a Python script to pull out data from Github on some projects for each language and store them into CSV files. The Search API came of help here.
This creates datasets for the languages with columns in the order – name of project, owner of projects, number of watchers, number of forks, if the repository is a fork or not. Since Github doesn’t list projects by language without a search keyword, we iterate over all the alphabets to get a list of 2,600 repositories.
Next just to compute it by the number of forks for Java vs. JavaScript, I sort the datasets for Java and JavaScript by their 4th column, ie. the number of forks and then plot one against one. I tried to fit a linear model in the data with a zero intercept to find the best fit line. These computations were done using R.
Java vs. JavaScript

Java vs. JavaScript by Forks
It yielded a linear model:
javascript = 3.609 * java
Thus we can notice that the tendency to fork is more in the JavaScript developers than Java. JavaScript folks like to take some project and then make changes or contribute to it more easily compared to Java.
I did the same computation for the number of watchers by sort them by column 3 of the original dataset and make appropriate changes to the code.

Java vs. JavaScript by Watchers
This yielded a linear model:
javascript = 7.22 * java
This result confirms that the there are lot more JavaScript developers who follow other JavaScript projects compared to the Java developers.
Java vs. Python
After trying to compute it by the number of forks for Java vs. Python here is what I got:

Java vs. JavaScript by Forks
The linear model came out as:
python = 0.7672 * java
Thus we can see the trend-line is more oriented towards Java, giving us a conclusion that Java developers fork repositories more than Python devs and try to make changes or contribute.
On doing the same analysis for watchers,

Java vs. JavaScript by Watchers
This result was a bit surprising, the linear model came out as:
python = 1.423 * java
Thus in the Python community there are more watchers for other projects compared to the Java community. So Python people love to follow a project more, when Java devs prefer forking it and contributing it.
Python vs. Ruby
The results of the number of forks for Python vs. Ruby is:

Python vs. Ruby by Forks
This gives a linear model:
ruby = 2.743 * python
The trend-line is more oriented towards the y-axis, ie. Ruby, concluding that Ruby devs fork a repo much more are compared to a Python developer. Rubyists like to jump into it, and make changes right away.
For the number of watchers:

Python vs. Ruby by Watchers
This gave us:
ruby = 2.329 * python
I don’t need to say much. Rubyists follow others projects more than the corresponding Python devs.
Conclusion
We can figure out important trends between different programming languages by comparing different aspects together and we can figure out certain important facts, like Java developers like to fork more than Python developers while Python developers like to follow others projects more. Rubyists fork and follow the projects more as compared to Python people, etc.
Its a nice way to learn and analyze different trends developer communities. More can be done, this is just a scratch on the surface.
Like this:
Like Loading...