"The biggest problem with this approach is developers who don't take a sufficiently long-term view. If you don't bother learning the current standards in terms of languages, frameworks, methodologies, etc, you'll have an increasingly hard time finding work — or at least interesting work. You may also find yourself inadvertently excluded from some tech communities because you don't have enough in common to have a good discussion. You'll be losing productivity in many cases because you're writing in a language where it takes you 10x as long to get the same amount done."
- Donnie Berkholz
"For the relatively small number of multi-licensed projects (765 total out of close to 60,000), I preferred the more copyleft variant — Perl stuff is generally Artistic/GPL dual licensed."
- Donnie Berkholz
"I suspect part of the R vs Matlab divide is suggested by what you said — "academic statisticians" — implying that they are in the stats department rather than CS. I'm a little surprised that JavaScript got even that many mentions, since my impression has been that many curriculums are moving toward Python as the new language of choice."
- Donnie Berkholz
"It's really a shame that using Git leads to larger commits because those people are totally screwing up git-bisect's ability to be completely awesome for development."
- Donnie Berkholz
"The development culture and tools will be pretty major, I think. Importing of e.g. jQuery and other dependencies into a repo will have a significant impact on these metrics. The difference between JavaScript and other high-level languages, even its cousin ActionScript, reflect this. Also IDEs that generate boilerplate code, that do major refactoring automatically, and so on will likely show a serious effect."
- Donnie Berkholz
"Like APL, COBOL isn't popular enough in open-source software to make the cut for this list (based on http://redmonk.com/sogrady/201.... In fact it's not in Ohloh either."
- Donnie Berkholz
"Unfortunately I'm rather limited by available data (and time) on improving some of this. If only I were still in academia and had more time to devote! What I've got is total # of projects, committers, commits, and loc_changed by language on a monthly resolution for about 20 years."
- Donnie Berkholz
"Yeah, some of that is related to what I coincidentally published in a follow-up post about 20 minutes ago: http://redmonk.com/dberkholz/2... Do you think the behavioral differences should reasonably be expected to apply to entire classes of languages, like functional programming? I would expect larger-scale trends across multiple languages to be more resistant to some of the points you mention. The tooling/IDE point is a great one, and I heard that from another expert in the field although their point pertained more to productivity while you're making a great argument for committing differences as well."
- Donnie Berkholz
"I think if you look at the bottom whisker (the 10th percentile), you might be able to get a decent feel for what the top users of a given language are doing. JavaScript could be a weird case where you'd need to go further down, like the top 1 percent."
- Donnie Berkholz
"That was basically the hypothesis going in: can we measure things this way? The results seem to bear out that it broadly works. That said, it's clearly an imperfect, somewhat noisy metric that's actually measuring a number of factors that combine to form the expressiveness in practice rather than in theory."
- Donnie Berkholz
"Exactly. That's the point I made in the post, although it was kind of buried in the middle. Expressiveness is just one measure, as you say. I'd love to see if I can find ways to get at data for the barrier to entry, the maintainability, the coding speed to complete similar problems, etc."
- Donnie Berkholz
"The IQR is a way to look at the width of the core distribution without being overly affected by outliers. The difference between the 10th and 90th percentiles would be another slightly less robust way to do the same thing. This width is essentially a single number to describe how variable the LOC/commit values are for a given language, which should be a view into use of the language across many problem domains and many developers. If the width is small, it should be both a generally applicable language (general to its entire "domain" in the case that it's a DSL) and a language that's used fairly well by at least half of its developers. I also noticed the higher ranking of relatively unpopular languages. Let's take a second-tier languages like CoffeeScript, for example, which was used by 391 developers across 200 projects in February. That's relatively small but not exactly a tiny group of super-leet coders. Third-tier languages, on the other hand, are absolutely subject to your point...."
- Donnie Berkholz
"There's definitely a fair amount of noise in the metric, but very few major outliers (i.e. ones that are *way* out of place, rather than just a few spots). The expressiveness of a language, in practical use, is a convolution of many variables including the language characteristics themselves, the standard library and ecosystem, the "culture" built around the language (is it one that encourages copying of external libraries, for example), etc. And you obviously nailed the fact that DSLs are special cases. I would be very curious what kinds of differences not at the syntax level, but otherwise, might be present in the Racket/Scheme instance. Any thoughts?"
- Donnie Berkholz
"You might be interested in checking out my colleague Steve's correlation of actual use on GitHub with conversation on Stack Overflow: http://redmonk.com/sogrady/201..."
- Donnie Berkholz
"John, thanks a lot for reading and commenting! I definitely agree with you that there are some huge caveats to what you can get out of this, and you nailed, in very concrete terms, two of the key ones I mentioned: "It won’t tell you how readable the resulting code is (Hello, lambda functions) or how long it takes to write it (APL anyone?), so it’s not a measure of maintainability or productivity." I struggled to come up with a good term to describe this — expressiveness was the best of a bad set. So what exactly does this metric tell you? It doesn't tell you much if anything about the writing or the reading, as you so well described, but rather something about the state of the code in the repository, the development practices in use, potentially the level of bugs you're likely to get (given the correlation between bugs and LOC). I could imagine it being pretty interesting to look at this kind of statistic across developers or organizations to see what you could learn about how they..."
- Donnie Berkholz
"Absolutely agree on trends over time. I took forever writing this up, so I wanted to get something out the door even though there's always more to do. Definitely want to break this data down in a few different ways, and time as a variable is near the top of the list. Unfortunately there's a lot of potentially confounding variables, or variables that get averaged out and nuances lost, but I've gotta work with the data at hand instead of waiting for something perfect to fall out of the sky into my lap. =)"
- Donnie Berkholz
"I'm wondering whether R vs Python in data analysis will break down a bit like Chef vs Puppet, broadly speaking ... people coming from different types of backgrounds preferring the style of one over the other."
- Donnie Berkholz
"That's why I'm addressing the data itself in Elon Musk's response, which I said is attempting to prove Broder wrong about basically everything he said."
- Donnie Berkholz
"Thanks for your comment! You're right that "always" is a strong word. I do find it interesting how consistently it falls short, however, despite the variation in driving styles indicated in the graph of Broder's speed over time."
- Donnie Berkholz
"Coming across this pretty late, but have something worth mentioning... The problem is precisely that — first-time contributors may not realize this distinction. So something you take as a given (commentary is purely technical and does not reflect upon the person), they may take as a personal attack. It's all in the eye of the beholder, and nobody else gets to decide how they perceive it, regardless of how you think it should be perceived. Taking the extra time to clarify that difference can be extremely worthwhile."
- Donnie Berkholz
"Doesn't seem like a big deal to me for anyone but mileage runners. I'm Platinum and spent close to double the threshold just on domestic coach travel. My tickets aren't unusually expensive, typically LUT fare classes — as someone else pointed out, this is something like $250/ticket for each ~2500-mile round trip."
- Donnie Berkholz
"You could make a reasonable argument that tools like Graphite/Riemann/Statsd and companies like Librato are helping to deal with the service-proliferation problem."
- Donnie Berkholz
"Thanks for your comment, Adrian. Just wanted to add that Neil Levine made the useful point on Twitter that implementing my final point as a PaaS could be useful. When combined with your extra details, that means the PaaS, whether public -- or private at the company or department levels -- could make the 100-instance minimum more palatable."
- Donnie Berkholz