Tuesday, December 11, 2012

Rhodesia 1976

Here's a fascinating historical document, a 1976 news report from Rhodesia. For those who don't know, Rhodesia was a part of the British colonial possessions in Africa until the local government declared unilateral independence in 1965 rather than submit to the UK's terms of departure, which required one-man-one-vote, which would have put the a black government in power. Rhodesia was isolated politically, and fought a long guerrilla war against black nationalists backed by the Soviet Union and China, until final surrender in 1979. Rhodesia then became Zimbabwe.


Things did not go well for Zimbabwe; Robert Mugabe eventually turned flat-out tyrant, and the country is now one of the poorest in the world.

I can't quite read the reporters' accents, but one of the interviews mentions Australia, so I think that's where the report is from.

Saturday, December 8, 2012

How it should work

-- Hello, I'm having a problem with my internet connection.
-- OK, try clicking on the up-and-down arrows in your browser to refresh your page.
-- Yeah, I'm not in IE and I already tried that half an hour ago. According to the modem diagnostics...
-- If that doesn't work, click on the X in the top right corner of the window...
-- ... Yeah, I already restarted fifteen minutes ago. Look, I think...
-- The next step is to restart your computer. Go to the bottom left corner...
-- I am a professional engineer and formally request exemption from elementary assistance.
-- ...
-- ...
-- Authenticate now.
-- So, when the buckled girder lets down the grinding span,
   'The blame of loss, or murder, is laid upon the man.
-- ...
-- ...
-- Pass phrase accepted. What distro are you running, and what is Wireshark telling you?

Saturday, December 1, 2012

In the beginning was the command line

"What would the engineer say, after you had explained your problem, and enumerated all of the dissatisfactions in your life? He would probably tell you that life is a very hard and complicated thing; that no interface can change that; that anyone who believes otherwise is a sucker; and that if you don't like having choices made for you, you should start making your own."
-- Neal Stephenson, "In the Beginning was the Command Line"

Saturday, November 17, 2012

Three functional programming languages

Scala is a functional programming language that is designed to work well with Java. That doesn't come free, of course. The language makes some compromises to play well with legacy Java code and the JVM, but it does make Scala a very good Java 2.0. If you are interested in functional programming and want a language that you may actually get to use professionally, Scala is the safe bet. But it will not cure your cynicism.

Programming in Scala

Lisp was shot in the head a generation ago by Algol-style syntax, but like Lisbeth Salander, it never quite died. A tenacious band of enthusiasts keeps insisting the language is special, and has never been proved wrong. If you want to learn a timeless classic of a language with a reputation for changing the way you look at programming, Lisp is a good choice. Also, Paul Graham will send you a mash note.

Practical Common Lisp



Haskell is a pure functional programming language known for extreme expressivity. SkyNet would be 300 lines in Haskell, tops. Its relentlessly orthodox functional nature will force you to confront, interrogate, and ultimately deconstruct the imperative paradigm you have been trained in. You will see imperative coding for the fire-lit cave that it is and depart for a sun-bright world of functional programming. (Actual results may vary -- Platonic enlightenment not guaranteed.)

Learn You a Haskell for Great Good

Thursday, November 1, 2012

Interesting Problems

Why can't I have interesting problems like this?
Basically what happened yesterday is i drank an entire bottle of whiskey, stabbed myself in the hand (not my drawing hand), had some kind of delirious breakdown in the hospital, got stitched up and sent back home.
Today I go to see the hand surgeon about my hand and a therapist about my emotions.
I think this is where I'm supposed to make an ironic comment about artistic types. Or maybe pseudo-ironic; it's hard to keep track.

Saturday, October 6, 2012

Moves from 1459


A Slovenian group of martial artists who study in the the European (rather than Asian) tradition have recreated the wrestling moves from the Thott 290 2ยบ manuscript, written in 1459 by Hans Talhoffer. It looks like they had a lot of work to do, since each move only had a single illustration. Fascinating stuff.

Saturday, September 29, 2012

Good judo clubs in Ontario

The list of athletes on the Ontario judo team makes for interesting reading. Some clubs are very well represented; others, not so much. If you are looking for a high-quality judo club in your town, you would do well to find one that produces top judokas, and the list makes it very clear where they are coming from.

Both of the clubs in Kitchener, Asahi and Kaizen, are well represented with 11 and 14 athletes, respectively.

Saturday, September 22, 2012

Camp Budokan

This looks interesting.

Camp Budokan is a week-long all-ages day camp focused on judo instruction, with an impressive roster of coaches. The website itself doesn't give dates, but according to the Judo Ontario website, Camp Budokan was run Sunday July 29 to Saturday August 4 in 2012.


Looks like a good bet for next summer.

Tuesday, September 18, 2012

Judo going well

After a lot of to and fro about getting back into martial arts, I finally got my act together after Labour Day. On the 4th of September, I began training at Kaizen Judo Club in Kitchener. Today, I had my fifth lesson.  



Things are going well. I'm studying with with a good group of other beginners who started around the same time, and we are learning a lot together. The early classes spend a lot of time on breakfalls, with a few basic techniques (foot-sweeps, hold-downs) occasionally included. It's a fairly slow ramp up, but I appreciate it. If they threw us right in among the more advanced students, we'd just get frustrated (not to mention exhausted), and we could well get hurt if we didn't know how to fall properly when thrown.

If everything keeps going well, I should earn my yellow belt before the end of the year.

Onward!

Chasing Mavericks

Chasing Mavericks is a film about one of the world's greatest surfing spots, back when it was still obscure, almost a myth. Jay Moriarty (Jonny Weston) is a young kid who's desperate to ride it, and Frosty Hesson (Gerard Butler) is the gray-haired voice of experience who discourages him on account of the dangers, but finally agrees to train the kid to ride the biggest of big waves. The trailer's a real treat and hints at some dramatic depth on the home front.


 The film opens October 26th.

Saturday, September 15, 2012

Judo results are well distributed

In the 2012 Olympics, there were 14 judo events -- seven weight classes for men, and seven for women -- for a total of 42 medals being awarded. Yet no country won more than seven of them, including Japan, the country where the sport was invented. France, hardly thought of as a judo superpower, did just as well. And eighteen countries managed to win something. That impresses me; judo is a very international sport.

Canda, for what it's worth, picked up one medal. Antoine Valois-Fortier won bronze in the men's 81kg class.

Sunday, September 9, 2012

Hansel and Gretel: Witch Hunters

Looks like someone in Hollywood found another 60 megabucks between the couch cushions, and used it to remake a fairy tale as an action movie: Hansel and Gretel: Witch Hunters is due out in January. If Van Helsing was your thing, this one's for you.

Saturday, September 1, 2012

A checklist for code reviews

A few years back, Atul Gavande, an American surgeon, got a lot of press from a study that showed dramatic reductions in deaths and complications from surgery when surgical teams instituted simple checklists. Death rates dropped from 1.5% to 0.8% and serious complications fell from 11% to 7%.

Checklists are not about creativity; they don't help you make dramatic insights. Instead, they're about consistency; they make sure you don't forget the simple, boring stuff. And some of the simple boring stuff is really important.

We in the software industry can use checklists as part of code reviews and inspections. This list covers the major concerns:
  1. (reuse) Is there already code that does something similar? Why is this code not reusing the existing work, perhaps with modifications? 
  2. (correctness) What has been done to verify that this code produces correct results? In particular, what parts of it are verified by automatic tests? 
  3. (clarity) If the purpose of this code or its implementation is obscure, where is it explained?
  4. (documentation) How widely is this code used? Is its documentation appropriate to the breadth of use? 
  5. (data volume) How large a volume of input is this code expected to process? Are the algorithms and data structures appropriate to the task? 
  6. (memory use) Does the code allocate memory? Who takes ownership of it, and how will it eventually be freed? 
  7. (error handling) How does the code report errors or unexpected conditions? Does it propagate error reports upward from code it calls? 
  8. (concurrency) Does this code execute concurrently? What has been done to avoid memory corruption and unnecessary exclusion? 
  9. (execution efficiency) Does this code need to execute quickly? What has been done to ensure it does so? 
  10. (storage efficiency) Does this code need to use storage parsimoniously? What has been done to ensure it does so? 
  11. (security) Does this code access or produce sensitive information? What has been done to keep this information secure? 
  12. (dead code) If this code replaces other code, has the older code been removed?
Checklists are by no means new in the context of software development. Their use is a standard part of formal software inspections as described in Software Inspection by Gilb and Graham. But in my experience, they are rarely (as in, practically never) used in industry. They're a good idea that should be used more widely.

Saturday, August 25, 2012

Dynamic typing vs static

A programming language is statically typed if the type of every variable has to be declared when it is created. Pascal, C, and Java are all statically typed. In dynamically typed languages, on the other hand, the types of variables are not declared up front; you just assign a value (however created) to a variable. Python and Lisp are dynamically typed.

One of the perennial arguments among software developers is whether dynamic or static typing is best. Generally speaking, the proponents of static typing emphasize security: having to declare the type of every variable gives the compiler a lot of information about what can be done with it, allowing the compiler to catch a lot of errors. The fans of dynamic typing, on the other hand, emphasize flexibility and brevity; which translate to development speed. Redundancy like this, in Java, is their enemy:
Foo foo = new Foo();
My background features mostly languages with static typing (C++ and Java). It wasn't until I got to Google that I had a chance to work with a major system written in a dynamically typed language (Python). This was TNT, a system for running end-to-end tests in the ads serving pipeline.

Overall, I found dynamic typing to be more hindrance than help for the project. I kept running into typing mismatches that in statically typed languages would have been caught at compile time, but which didn't show up until run-time in TNT. And since the system takes half an hour to run, the difference between run-time and compile-time error reporting is huge.

My take-away from this experience is that dynamic typing is great for small projects that run quickly and fit between one pair of ears. In those cases, the redundancy and verbosity of declared types just gets in the way. But the larger the project, and the slower the debugging cycle, the more helpful static typing becomes.

I'm hoping Dart's system of optional type declarations catches on and makes it into Python at some point.

Canada: all about the hockey

The Wikipedia entry on Ice Hockey has some interesting stats about the popularity of hockey in various countries. (Follow the link and scroll down to "Number of Registered Players by Country.") Canada is first with 572,411, the US is second with 500,579, and then there's a whole lot of nothing until we come to the Czech Republic at 100,668. Canada is also first in the portion of the population that plays the game.

I had not realized Canada was quite so dominant. But still, it's kind of depressing that the US can almost match us in their fourth-most popular sport.

Saturday, August 18, 2012

Jobs requiring functional programming

If you want a job as a software developer, how useful is it to know functional programming languages?

To answer this question, I went to four employment sites and searched for four functional languages: Scala, Erlang, Clojure, and Haskell. For calibration, I also searched for three mainstream programming languages: Java, C++, and SQL. Two of the employment sites were Canadian (Workopolis and Monster.ca) and two were American (Monster.com and careers.joelonsoftware.com).

Workopolis:
Java     696
C++      282
SQL     1368
Scala      3
Erlang     2
Clojure    1
Haskell    1
Monster.ca:
Java     668
C++      288
SQL     1000+
Scala      2
Erlang     0
Clojure    0
Haskell    0
Monster.com:
Java    1000+
C++     1000+
SQL     1000+
Scala     72
Erlang    36
Clojure   12
Haskell   26
careers.joelonsoftware.com:
Java     252
C++      140
SQL      231
Scala     24
Erlang    11
Clojure    9
Haskell    9
Based on these figures, there are some jobs out there that call for functional-programming expertise, but not many, particularly compared to the number calling for mainstream imperative languages. And virtually none of them are in Canada.

Tuesday, August 14, 2012

Come out, come out, wherever you are!

Blogger.com keeps all kinds of stats about the blogs it hosts; pageviews are just the beginning. One of the stats it tracks is the type of browser the requests are coming from.

I was surprised to find that one of my readers is using NS8, i.e. Netscape, a very old browser. That's a very curious thing indeed.

If you are the mysterious Netscape user, would you step out of the shadows for a moment and tell us why you chose your browser? I've enabled anonymous commenting in case you want to remain a man of mystery.

Sunday, August 12, 2012

Fixing a broken codebase, part II

In part I, I described a scenario of a software engineer hired to improve a decades-old scientific codebase of 200,000 lines. I proposed seven first-aid measures that together would keep things from getting worse, and ensure all new code was of much higher quality than the old. Here in part II, I'll discuss what to do about the older legacy code.


One tempting choice is to rewrite the whole codebase. It would be enormously satisfying to start fresh and indeed it is likely that the second version would be much better, since it could be designed based on everything that has been learned from the first one. The problem with this plan is time. The basic COCOMO model estimates that writing a 200,000 line program takes 625 person-months, with a delivery time of 28 months. Merely rewriting code is easier than writing it from scratch, so this estimate is likely to be on the high side but this is definitely a project that would take person-years, rather than months. And it is unlikely the group of scientists is willing to wait that long. Accordingly, something more selective is called for.

What I have in mind are four measures that will gradually improve the codebase, without the high hurdle of a complete rewrite. These measures will accelerate the improvement that the policies from part I enabled. In particular, the team should undertake
  • dead code removal,
  • refactoring training,
  • a Better than You Found It policy, and
  • targeted rewrites.

Dead Code Removal
The codebase in this scenario is a couple of decades old. That means it contains a lot of code that isn't needed any more: old projects, failed experiments, and obsolete concerns. This old unused code is a problem because it complicates the codebase, making it more difficult to add new code for new purposes. It should therefore be removed.

Much of this dead code is likely to be hiding behind options and flags. Some of it may also be entirely commented out. The way to identify the code is to look at what configurations are actually run, and thereby determine what options are actually used. It is then possible to find all the options that aren't used, and work with the users (i.e. the scientists) to determine which ones actually have to be kept.

The policy of assigned code ownership (from part I) aids this effort, since the code owners are the people who know their portions best. They are therefore the right people to undertake the dead code removal. The source code control system is also useful, since it makes it possible to return any removed code if it turns out to be needed later. It is therefore possible to be quite aggressive in removing code that is suspected of being dead.

Once all the dead code is removed, the result will be a much smaller and much clearer codebase that is much easier to work with.

Training in Refactoring
The plan is to gradually improve the existing codebase. This will require carefully refactoring the code, restructuring it to be more comprehensible without changing the results it produces. Doing this is not obvious, particularly for people without training and experience in software engineering. A bit of targeted training would therefore be useful.

What I have in mind is a day-long course. It would start with the concept of code smells, signs that indicate trouble. Among these are very long functions, use of global variables, confusing names of functions and variables, poor encapsulation of functionality, repeated code, and an absence of test coverage. The engineer would then show an example of a refactored module, pointing out problems in the original and how they fixed them. Finally, the students would have a chance the try refactoring themselves, using actual code from the codebase.

Books such as Working Effectively with Legacy Code and Refactoring are useful resources for this training. The idea (from Working Effectively) that legacy code is code without tests and that having tests makes confident refactoring possible, is particularly useful and should be included in the course.

Better Than You Found It
The skills the staff developed in the refactoring course will be put to use by instituting a policy of Better Than You Found It. Whenever a developer makes a change to existing code, they should not only write the new code correctly, but also take the time to improve the nearby code. For example, if they are adding code to a function that is already oversized, they should break it up. Or if they are calling a function with a wildly complicated parameter list, they should redesign the function to be more selective in its inputs. It isn't necessary to fix the whole module, or even the whole file, just a portion of it, leaving things a little better than it was before.

Recall that a complete rewrite is not the plan. Instead, by following the policy of Better Than You Found It, the most commonly used parts of the codebase will steadily improve. And the improvement will come organically, implemented by staff that ran into problems and had to understand the old code anyway. The danger in undertaking this policy is staff resistance, since it is requiring them to do work that is not obviously part of whatever they were trying to accomplish. It is therefore important to make the policy both universal and fairly lightweight. Burdens shared equally are easier to bear, and the task of improving the codebase is less onerous if it is done a little at a time. Ideally, with training and experience, the staff will come to see the ugly old code as distasteful, and will want to fix what is obviously broken, the legacy of the bad old days.

The policies of code ownership and mandatory code review, introduced in part I, support this effort by making sure people don't shirk their refactoring responsibilities.

Targeted Rewrites
Diet and exercise will not cure cancer; that takes surgery and chemotherapy. Similarly, gradual refactoring will not fix the most dramatically broken modules in the codebase; they should be completely rewritten. Typically what happens in an evolving codebase is that some modules, though initially sound, are called upon to do ever more and assume responsibilities far beyond what was originally imagined. This tends to be a problem because their underlying architectures were designed for the original purpose, and often aren't changed as their uses evolve. Over time, this requires increasingly convoluted code to implement new features. These are the modules that should be rewritten from scratch with completely new architectures suitable for their new requirements.

The best source of information for identifying the really bad parts of the code is the development staff itself. They will have many stories about scary modules they are reluctant to modify, but sometimes must. And they will not be reluctant to share this information. The bug database from part I will also be useful, since bad modules tend to be continual sources of bugs.

To Summarize
In part I, I described how to keep the team's codebase from getting worse, by making it possible to write new code cleanly. In this article, part II, I proposed four measures for improving the older code. Dead code removal simplified the codebase by removing unnecessary old code entirely. Training in refactoring built the staff's skills in upgrading the older code, and the policy of Better Than You Found It put those skills to work improving the code, gradually. Finally, where gradual improvement wasn't sufficient, I called for targeted rewrites of the most troubled modules. Doing all of this will take time and effort, but the team will have a dramatically improved codebase in a year or so.



Valve's corporate culture

So I've checked out the Valve employee handbook that made the rounds of the internet a few months back. The corporate culture it depicts is very open and very egalitarian. 

Picking a project:
We’ve heard that other companies have people allocate a percentage of their time to self-directed projects. At Valve, that percentage is 100. Since Valve is flat, people don’t join projects because they’re told to. Instead, you’ll decide what to work on after asking yourself the right questions (more on that later). Employees vote on projects with their feet (or desk wheels). Strong projects are ones in which people can see demonstrated value; they staff up easily.
Structure:
Valve is not averse to all organizational structure—it crops up in many forms all the time, temporarily. But problems show up when hierarchy or codified divisions of labor either haven’t been created by the group’s members or when those structures persist for long periods of time. We believe those structures inevitably begin to serve their own needs rather than those of Valve’s customers.
Working hours:
While people occasionally choose to push themselves to work some extra hours at times when something big is going out the door, for the most part working overtime for extended periods indicates a fundamental failure in planning or communication. If this happens at Valve, it’s a sign that something needs to be reevaluated and corrected. If you’re looking around wondering why people aren't in “crunch mode,” the answer’s pretty simple. The thing we work hardest at is hiring good people, so we want them to stick around and have a good balance between work and family and the rest of the important stuff in life.
These are admirable principles. I would love to work Valve's way.

But I have to wonder whether the handbook depicts how things are actually done at Valve. I suspect it describes an idealized version of Valve, Valve as it ought to be. And perhaps the company sometimes achieves that ideal. Also, Valve is small; right now they have only 293 employees. I suspect things would change considerably if they were to grow by an order of magnitude. Companies have a way of growing more formal and rigid as they grow.

Wednesday, August 8, 2012

Fixing a broken codebase, part I

There's an interesting post in Ars Technica by a software engineer who has been hired by a group of scientists to modernize their development practices and fix a big mess of spaghetti-code that has accumulated over 10-20 years. He wants a few fairly simple techniques that will make a big difference.


In my opinion, the engineer should start by instituting some basic modern development practices. They won't fix the broken codebase, but they will keep things from getting worse. They'll also lay a foundation of good practice that will make later improvements possible.

The first phase should institute:
  1. Source control. The scientists need to be able to control who does what to the codebase, and keep a history of changes. A modern source code control system like Perforce or Mercurial does exactly that.
  2. Code review. Every change to the code should be examined by someone other than the original author, and that examiner should be empowered to demand changes until he is satisfied the new code is sound. Many eyes make all bugs shallow, and it starts with code review.
  3. Code ownership. The system is too large for anyone to understand well. It should be divided into portions, based on who understands what best about the system as it is now. Once that's done, the code review policy should mandate that all reviews include the owner of the code being changed.
  4. Daily builds. It should be possible to check out the complete codebase from the source code control system and build (compile and test) it from scratch. This should be done at least daily. It may be best to assign staff to a rotation, with each person being build engineer for a day. In addition to building the current codebase, the build engineer is also responsible for fixing the build if it is broken by identifying the offending code and rolling it back. The build engineer is done when the codebase builds cleanly again.
  5. Bug database. All error reports and feature requests should be stored in a central searchable repository, so the code owners know what is broken and what has been fixed.
  6. Design documents. All new functionality should be documented up front, presented for peer review, and revised until approved. The documents don't need to be very detailed, but they should describe the essential elements of the proposed functionality and why it is being introduced. Also, if the designs change during development, the documents should be updated to reflect what was actually built. And the design documents should be kept in some readily accessible place.
  7. Unit tests. All new functionality should include tests that verify that the code performs as expected. Going forward, these tests will make sure that any additional changes do not break existing functionality. These tests should be run as part of every daily build. Also, all fixed bugs should have unit tests that reproduce the failure, but which run correctly with the fix in place.
These seven policies will keep things from getting worse. If the team did nothing else, they would have their old mess of a codebase, plus a slowly growing layer of superior new code that accesses it. Over time, the average quality of the codebase would slowly rise, as more and more code would be of the new type.

In part II, I'll describe what can be done to improve the older code.

Monday, July 30, 2012

Watchmen prequels

If you read comics at all, you have read Alan Moore's "Watchmen" series. Or maybe you saw the 2009 film.

If you loved either one, you'll be happy to know that DC is publishing prequels featuring the same characters: Nite Owl, The Minutemen, Silk Spectre, Ozymandias, Dr. Manhattan, Rorschach, and the Comedian. It's not a single unified series, but rather seven co-themed series of four or six issues each, 34 total. They've published seven so far, and you can still find the early ones in shops if you want to start from the beginning.

I'm not usually a fan of comics, but this series, like its predecessor, impresses me. It's written for distinctly adult tastes, and that makes a real difference. The Silk Spectre series, with a teenage SS breaking away from her very demanding mother (the original Silk Spectre) and making her way into the San Francisco counterculture, is particularly appealing.

Recommended.

Saturday, July 28, 2012

Akin to blasphemy

From Programming in Scala by Odersky/Spoon/Venners:
If you're coming from an imperative background, such as Java, C++, or C#, you may think of VAR as a regular variable and VAL as a special kind of variable. On the other hand, if you're coming from a functional background such as Haskell, OCaml, or Erlang, you might think of VAL as a regular variable and VAR as akin to blasphemy.
TRVTH.

Friday, July 27, 2012

Three months in a raft

While on vacation I had a chance to read Thor Heyerdahl's Kon-Tiki: Across the Pacific by Raft. If you grew up on PBS as I did, you've heard the story: in 1947 five Norwegians and a Swede built a balsa-wood raft in Peru and sailed west in order to support Heyerdahl's theory that the islands of Polynesia were populated from South America. They sailed for 101 days, finally reaching Raroia in French Polynesia.


The book was published a year later. It's a distinctive piece of work, written by Heyerdahl himself. The book is at its best when describing the obstacles overcome by the expedition: obtaining supplies from the US Army Quartermaster Corps, looking for balsa-wood logs in Equador, braving a storm at sea, and finally surviving the disastrous landfall on Raroia, which wrecked the raft on a coral reef. It becomes very clear that the author was either very well connected or very persuasive, since he managed wrangle the assistance of lofty figures most people would never even get to meet: a high official in the Army, the "balsa king of Equador", and the president of Peru.

The voyage could also very easily have ended badly. A worse storm might have toppled the raft. Slightly different construction techniques could have caused the raft to break apart. And the expedition did not have much margin for error: the raft was alone in the ocean far from shipping lanes, weeks from any possibility of rescue.

Passing years have not been kind to the science Heyerdahl and his crew crossed the Pacific to prove. It is clear now that Polynesia was settled from the west, not the east. But then, this is not a book that should be read for its science, but rather as a tale of great adventure long ago.

Recommended.

FP languages are expressive

I've spent this week learning functional programming using two languages from opposite ends of the FP space: Haskell, an uncompromisingly functional language, and Scala, which mixes in some OO concepts with an eye to inter-operation with Java.

The central argument for FP is expressivity. Because of the power of the tools FP languages offer, you are supposed to be able to get a lot done with only a few lines of code.

To test this claim, I used both languages to implement a program from The Practice of Programming by Kernighan and Pike. The program reads in a file of text, generates a Markov-chain transition map from it, and then uses the map to produce random text. The authors implemented this program in several major languages, and found that the amount of code needed varied significantly: the C program was 150 lines, the Java was 105, and C++ clocked in at 70 lines.

Both Haskell and Scala did better than this. The Haskell program was 45 lines, and Scala did even better, at 41(!).

As far as I can tell, the FP enthusiasts' claim of greater expressivity is right on the money.