Sunday, August 12, 2007

Silver Bullets Incoming!

((( This was originally posted on December 8th 2006 on Blogthing. It is reposted here because Blogthing seems to be permanently down. The original was posted to Programming Reddit and extensively discussed at the time, so please do not resubmit this version.)))

-----

First, an apology to those who have tuned in to read about the obstacles to technology adoption. I will get to it, but first I wanted to respond to the staggering set of comments to my post about functional programming. In addition to the ones here there is the thread I started at Joel On Software and over 100 posts at programming.reddit.com. Wow!

Some people tried to post here first, but postings are held for moderation even if you don’t get flagged as a spammer. Blogthing is apparently configured to reject URLs with hyphens, even though Blogthing puts hyphens in its own URLs. Anyway, one bit of pond scum posted a bit of JavaScript which automatically redirected a browser to some other site, so be glad you didn’t have to deal with that.

Also I’m sorry I couldn’t moderate more often, but I’m afraid that work and family take priority.

Anyway, on to some responses. I’m going to tackle the more common themes in no particular order:

Lines of code is a lousy metric

I think that Dijkstra had it right: a line of code is a cost, not an asset. It costs money to write, and then it costs money to maintain. The more lines you have, the more overhead you have when you come to maintain or extend the application.

Many people asked about development time, or pointed out that the Relational Lisp version in the US Navy study took even less time than the Haskell. However development time is often not available, when reported it is highly dependent on the programmer in question, and may not always be reported accurately (how do you count the time you spent thinking about the problem when driving to work?).

One study, which I am annoyed with myself for not remembering earlier, did collect coding time and found it highly variable, but on average the terser languages took proportionately less time. Implementations for the specimen problem in that study also exist in Haskell and Lisp. The average Haskell program was 57 lines, the average Lisp program was 119 lines, and the average Python program was about 80 (from the graphs in the article). C, C++ and Java all had averages around 240. The shortest program was in Haskell, with a mere 27 lines. The next shortest appears to be a Python program at about 40 lines.

It was just a toy problem, so not representative

True, but non-toy problems are too expensive to run statistical tests on. The Erlang ATM switch was a non-toy problem, but it was also not a controlled study with a statistically valid number of independent implementations. You can’t have both (unless you are an eccentric millionaire).

As several people pointed out, the Annual ICFP Programming Contest provides a non-toy problem and a tight deadline every year. Its a level playing field: the problems are hard and open ended, and any language may be used. Functional languages always dominate the top ten, and for the last 3 years the winners have used Haskell (although last year it was one of several languages used by the winners).

This kind of problem is obviously suited to functional languages, so they look artificially good

I hope I have cited a sufficiently wide range of problem domains, from text processing to geometry to telecoms, to be able to argue otherwise.

There is nothing in the way that functional programming languages work that necessarily fits them to a niche domain, although they do tend to shine particularly in applications requiring complex algorithms and symbolic rather than numeric computation.

Libraries matter more than languages

Libraries and languages are complicit: they affect each other in important ways. In the long run the language that makes libraries easier to write will accumulate more of them, and hence become more powerful.

There is no sharp dividing line between language and library. For example, Haskell has no loop construct built in to the language. Instead there are functions in the standard library to do the job of loops. Conversely Perl has regular expressions as a built-in type, whereas most languages treat them as a library. Therefore the programming environment as a whole should be judged, with language and libraries combined. I don’t know how well the studies above did this. I do know that the US Navy study did not have STL for C++ because it hadn’t been invented then.

C++ is an old language: you should compare it with modern languages like Python

Fair enough. Haskell is an order of magnitude better than C++, and maybe Java. Against Python it is merely significantly better.

An order of magnitude is a factor of 10, no less

Well, the Wikipedia entry does say about 10. All this stuff is so approximate that anything consistently in excess of 5 is close enough.

You can do functional programming in C++ with Boost

Yes I know. This argument reminds me of the early days of OO, when people argued that OO languages were not necessary because you could do the same thing in C with structures of function pointers. Yes you could, but it was a masochistic exercise. I haven’t played around with Boost, but I suspect the same applies. Trying to retrofit ideas from one paradigm into a language designed for another is generally a bad idea.

Python has functional programming stuff, so why do I need Haskell or Erlang? Anyway, purity just gets in the way.

You want Erlang for big, distributed, highly concurrent systems. There is simply nothing comparable.

If you aren’t persuaded by the scalability argument for purity then try this: the purity of Haskell lets the compiler do lots more optimisations. You could write code in either language to apply a series of maps and folds to a list, but Python would (AFAIK) generate all the intermediate lists, while the Haskell compiler would apply a deforestation transform to eliminate all the intermediate lists and therefore be lots faster. The Haskell compiler can do this because the intermediate functions are guaranteed not to have side effects, and therefore the order of execution can be rearranged at will.

Haskell isn’t ready for prime time yet

True, but its pretty close. The wxHaskell and Hs2GTK GUI libraries are currently being updated, database access works fine, GHCi is getting a debugger, and people are already using it for real work.

I wouldn’t recommend using Haskell on a 50-programmer-year bet-the-company project just yet, but for a small agile project where correctness matters more than execution speed its fine today.

No comments: