Very Short, Very Sharp: software engineering

Showing posts with label software engineering. Show all posts

Sunday, August 10, 2014

Building a Compiler, Briefly

During the last meeting of the Toronto Haskell group, we briefly discussed the question of how to build a compiler, and in particular the problem that building a real one is a daunting task.

I've done a bit of digging into various approaches and guides. There are a lot of them, because every respectable CS curriculum needs a compilers course.

The one I find most intriguing is by Niklaus Wirth, the guy behind Pascal and Modula-2. In his "Compiler Construction" PDF, he guides you through building a compiler as a series of exercises in a bit over a hundred pages. Given that other texts run well past 500 pages, a tractable treatment is quite a relief. I get the impression you could work through the guide in 2-3 of weeks full time or perhaps a quarter as a side project.

That said, Wirth does make some simplifications for pedagogical purposes. The source language is Oberon-0, which is a pared-down version of Oberon, so almost but not quite a real language. The target architecture is a simplified RISC instruction set, with an interpreter that fits in a page. You build the compiler from scratch, rather than using tooling like ANTRL, parsec, or LLVM, which you'd presumably leverage in an actual implementation. And the treatment of code optimization is brief. So, choices were definitely made.

But all in all it is possible to make building a simple compiler the work of a couple of weeks rather than a months-long slog.

Monday, July 29, 2013

Joel Spolsky on Language Wars

Joel Spolsky is the CEO of Fog Creek, a software company in New York. He also blogs about software development on his website, Joel on Software. In one of his older posts, he addresses the question of what language and framework is best for web development, and drops some giant blocks of stone-cold sense right in front of you.

Some choice quotes, condensed from the article:

Which Web Server (Apache, IIS, or something else) should we use and why?

People all over the world are constantly building web applications using .NET, using Java, and using PHP all the time. None of them are failing because of the choice of technology.

All of these environments are large and complex and you really need at least one architect with serious experience developing for the one you choose, because otherwise you'll do things wrong and wind up with messy code that needs to be restructured.

How do you decide between C#, Java, PHP, and Python? The only real difference is which one you know better. If you have a serious Java guru on your team who has build several large systems successfully with Java, you're going to be a hell of a lot more successful with Java than with C#, not because Java is a better language (it's not, but the differences are too minor to matter) but because he knows it better.

Yes. Absolutely yes. There are several different solutions out there, all of them work, and the best choice is likely to depend on something other than a theoretical determination of which one would be best in Plato's world of pure forms, for any number of reasons:

You probably aren't building the system from nothing. You are starting with an existing system, and adding on to it. Using whatever language or framework the system is already built from is a big advantage, because the new stuff needs to work with the old.
You know something but not everything. A Java-based solution would need something very special indeed to do better than a Python-based solution, if you already know Python backwards and forwards but you've never touched Java.
There is already a standard solution. Someone has already designated a language or framework as standard in your organization, hopefully after carefully weighing costs and benefits, but maybe not. In any case, using anything else would require an arduous process of argument and justification, and every day you spend on the fight is a day you could have spent designing and building your system.

Go read the article. Really. Giant blocks of stone-cold sense. Here's that link again.

Thursday, April 25, 2013

Small Changes, Big Problems

When working in a mature codebase, there is a common scenario of a small change that is OK by itself, but which aggravates an existing code health problem. For example, someone may need to add another function to a file that is already thousands of lines long, or another parameter to a list of dozens, or another cut-and-paste function that is almost but not quite like several others.

Cases like this are hard because of the duality of the problem. On the one hand, the developer is only doing what many others have done before, but on the other they are definitely making things worse.

Let's begin by considering three ways of handling the situation.

1. Found a snake? Kill it.

Under this policy, whoever needs to make changes to code that has a real code health problem is responsible for making things right. They are supposed to consider the whole problem and implement a proper solution.

The real strength of this policy is its immediacy. Code is fixed as it gets touched, meaning that the most vital portions of the codebase get updated in short order.

The problem with this policy is disproportionality. A small change can turn into a huge refactoring job. And there can be second-order problems as developers twist their designs to avoid having to deal with that crawling file of horrors two directories over.

2. The Boy Scout rule.

The old rule among the Boy Scouts was to leave the campground better than you found it. In the context of coding, this means doing a little bit of cleanup when encountering an ugly bit of code, but not necessarily rewriting the whole thing. Add a test, pull common cut-and-pasted code into a function, eliminate a redundant parameter or two -- nothing too arduous.

The strengths of this policy are the continual progress it encourages and the rather modest expectations it places on developers. These modest expectations mean that the policy is actually likely to be followed.

The real weakness is slow progress -- big problems will improve only slowly. There are also some problems that are not amenable to gradual reform.

3. For everything there is a season.

Under this policy, the right thing to do when encountering a nasty bit of code is to file a bug and enter it into the owning team's list. The team then periodically (quarterly? yearly?) runs a bug bash to clean up accumulated problems.

The strength of this policy is the opportunity for prioritization before the bash. There are always more problems than there is time available for fixing them, and some are more important than others. This policy also avoids mixing changes for new features with changes to fix accumulated problems.

The weakness is the lack of immediacy; things get worse before they get better. These is also a real risk that some problems are never fixed. Some teams are very diligent about tending their bug lists; for others, the list is where bugs go to be forgotten.

A common policy

For my money, the best of these policies is the Boy Scout rule. It ensures continual progress without asking for too much, and is therefore likely to be actually followed. I also expect that the changes it calls for are typically in some of the most vital code in the codebase, since unimportant code tends to be left alone.

That said, there are definitely cases where the Boy Scout rule is inappropriate: developers who are unfamiliar with the codebase, problems that require large-scale fixes, and crisis times when there just isn't time. In such cases, it's better to file a bug for the next bug bash. But the more this is done, the more vital it becomes to actually hold those bug bashes regularly and intensively.

Saturday, April 6, 2013

Working for a Non-Coding Boss

The Trenches is a webcomic about a gaming QA team. The site has an interesting side-column, called Tales from the Trenches, where game devs, QA, and a few other technical folks anonymously share stories about horrible, horrible jobs.

One recent entry was from an in-house developer working for an unappreciative boss:

I am the sole developer for an in-house fully custom CRM. It was developed by an amateur and was clunking along managing a mid-sized company’s affairs. It was undocumented, messy, and riddled with tricky bugs. I was brought in to maintain and extend it.

...

My boss will not allow me the time to slow down and do a better job, and when asked if I could have a tiny percentage of someone (anyone!)‘s time in the office so that I could have SOME kind of QA I was told that my code shouldn’t have bugs in the first place. This was accompanied with some pointed words about my upcoming personnel review.

...

The lesson here, fellow trenchermen, is twofold, number one, INSIST on the time and resources you need to do your best work. If you do not get what you require, communicate that you will not be responsible for problems down the line. Put it in writing. The second lesson is don’t work for a boss that can’t code. It sucks big fat hairy monkey balls.

That's a nasty position to be in, but I think the writer is drawing the wrong conclusion. If you are working for someone who can't do your job, they are in no position to argue about how long things take. If you say this new feature will take three weeks, they can be glad about it or sad about it, but they don't have the inside knowledge to contradict with anything more than bluster.

In a situation like this, the right working relationship to establish is that the boss gets to choose what features get added, their order, and their scope. He does so based on a) his analysis of business needs and b) estimates provided by the developer about how long each feature will take. But, and this is crucial, the developer is responsible for providing those estimates and they always include the time to do the job right, including the sort of refactoring that over time will pay down the accumulated technical debt in the system.

If you can establish this relationship, there is no reason working for a non-coding boss should be a chore. They don't understand what you do, true, but that lack of understanding also provides a crucial freedom to do things your way (the right way, hopefully).

Saturday, September 1, 2012

A checklist for code reviews

A few years back, Atul Gavande, an American surgeon, got a lot of press from a study that showed dramatic reductions in deaths and complications from surgery when surgical teams instituted simple checklists. Death rates dropped from 1.5% to 0.8% and serious complications fell from 11% to 7%.

Checklists are not about creativity; they don't help you make dramatic insights. Instead, they're about consistency; they make sure you don't forget the simple, boring stuff. And some of the simple boring stuff is really important.

We in the software industry can use checklists as part of code reviews and inspections. This list covers the major concerns:

(reuse) Is there already code that does something similar? Why is this code not reusing the existing work, perhaps with modifications?
(correctness) What has been done to verify that this code produces correct results? In particular, what parts of it are verified by automatic tests?
(clarity) If the purpose of this code or its implementation is obscure, where is it explained?
(documentation) How widely is this code used? Is its documentation appropriate to the breadth of use?
(data volume) How large a volume of input is this code expected to process? Are the algorithms and data structures appropriate to the task?
(memory use) Does the code allocate memory? Who takes ownership of it, and how will it eventually be freed?
(error handling) How does the code report errors or unexpected conditions? Does it propagate error reports upward from code it calls?
(concurrency) Does this code execute concurrently? What has been done to avoid memory corruption and unnecessary exclusion?
(execution efficiency) Does this code need to execute quickly? What has been done to ensure it does so?
(storage efficiency) Does this code need to use storage parsimoniously? What has been done to ensure it does so?
(security) Does this code access or produce sensitive information? What has been done to keep this information secure?
(dead code) If this code replaces other code, has the older code been removed?

Checklists are by no means new in the context of software development. Their use is a standard part of formal software inspections as described in Software Inspection by Gilb and Graham. But in my experience, they are rarely (as in, practically never) used in industry. They're a good idea that should be used more widely.

Sunday, August 12, 2012

Fixing a broken codebase, part II

In part I, I described a scenario of a software engineer hired to improve a decades-old scientific codebase of 200,000 lines. I proposed seven first-aid measures that together would keep things from getting worse, and ensure all new code was of much higher quality than the old. Here in part II, I'll discuss what to do about the older legacy code.

One tempting choice is to rewrite the whole codebase. It would be enormously satisfying to start fresh and indeed it is likely that the second version would be much better, since it could be designed based on everything that has been learned from the first one. The problem with this plan is time. The basic COCOMO model estimates that writing a 200,000 line program takes 625 person-months, with a delivery time of 28 months. Merely rewriting code is easier than writing it from scratch, so this estimate is likely to be on the high side but this is definitely a project that would take person-years, rather than months. And it is unlikely the group of scientists is willing to wait that long. Accordingly, something more selective is called for.

What I have in mind are four measures that will gradually improve the codebase, without the high hurdle of a complete rewrite. These measures will accelerate the improvement that the policies from part I enabled. In particular, the team should undertake

dead code removal,
refactoring training,
a Better than You Found It policy, and
targeted rewrites.

Dead Code Removal
The codebase in this scenario is a couple of decades old. That means it contains a lot of code that isn't needed any more: old projects, failed experiments, and obsolete concerns. This old unused code is a problem because it complicates the codebase, making it more difficult to add new code for new purposes. It should therefore be removed.

Much of this dead code is likely to be hiding behind options and flags. Some of it may also be entirely commented out. The way to identify the code is to look at what configurations are actually run, and thereby determine what options are actually used. It is then possible to find all the options that aren't used, and work with the users (i.e. the scientists) to determine which ones actually have to be kept.

The policy of assigned code ownership (from part I) aids this effort, since the code owners are the people who know their portions best. They are therefore the right people to undertake the dead code removal. The source code control system is also useful, since it makes it possible to return any removed code if it turns out to be needed later. It is therefore possible to be quite aggressive in removing code that is suspected of being dead.

Once all the dead code is removed, the result will be a much smaller and much clearer codebase that is much easier to work with.

Training in Refactoring
The plan is to gradually improve the existing codebase. This will require carefully refactoring the code, restructuring it to be more comprehensible without changing the results it produces. Doing this is not obvious, particularly for people without training and experience in software engineering. A bit of targeted training would therefore be useful.

What I have in mind is a day-long course. It would start with the concept of code smells, signs that indicate trouble. Among these are very long functions, use of global variables, confusing names of functions and variables, poor encapsulation of functionality, repeated code, and an absence of test coverage. The engineer would then show an example of a refactored module, pointing out problems in the original and how they fixed them. Finally, the students would have a chance the try refactoring themselves, using actual code from the codebase.

Books such as Working Effectively with Legacy Code and Refactoring are useful resources for this training. The idea (from Working Effectively) that legacy code is code without tests and that having tests makes confident refactoring possible, is particularly useful and should be included in the course.

Better Than You Found It
The skills the staff developed in the refactoring course will be put to use by instituting a policy of Better Than You Found It. Whenever a developer makes a change to existing code, they should not only write the new code correctly, but also take the time to improve the nearby code. For example, if they are adding code to a function that is already oversized, they should break it up. Or if they are calling a function with a wildly complicated parameter list, they should redesign the function to be more selective in its inputs. It isn't necessary to fix the whole module, or even the whole file, just a portion of it, leaving things a little better than it was before.

Recall that a complete rewrite is not the plan. Instead, by following the policy of Better Than You Found It, the most commonly used parts of the codebase will steadily improve. And the improvement will come organically, implemented by staff that ran into problems and had to understand the old code anyway. The danger in undertaking this policy is staff resistance, since it is requiring them to do work that is not obviously part of whatever they were trying to accomplish. It is therefore important to make the policy both universal and fairly lightweight. Burdens shared equally are easier to bear, and the task of improving the codebase is less onerous if it is done a little at a time. Ideally, with training and experience, the staff will come to see the ugly old code as distasteful, and will want to fix what is obviously broken, the legacy of the bad old days.

The policies of code ownership and mandatory code review, introduced in part I, support this effort by making sure people don't shirk their refactoring responsibilities.

Targeted Rewrites
Diet and exercise will not cure cancer; that takes surgery and chemotherapy. Similarly, gradual refactoring will not fix the most dramatically broken modules in the codebase; they should be completely rewritten. Typically what happens in an evolving codebase is that some modules, though initially sound, are called upon to do ever more and assume responsibilities far beyond what was originally imagined. This tends to be a problem because their underlying architectures were designed for the original purpose, and often aren't changed as their uses evolve. Over time, this requires increasingly convoluted code to implement new features. These are the modules that should be rewritten from scratch with completely new architectures suitable for their new requirements.

The best source of information for identifying the really bad parts of the code is the development staff itself. They will have many stories about scary modules they are reluctant to modify, but sometimes must. And they will not be reluctant to share this information. The bug database from part I will also be useful, since bad modules tend to be continual sources of bugs.

To Summarize
In part I, I described how to keep the team's codebase from getting worse, by making it possible to write new code cleanly. In this article, part II, I proposed four measures for improving the older code. Dead code removal simplified the codebase by removing unnecessary old code entirely. Training in refactoring built the staff's skills in upgrading the older code, and the policy of Better Than You Found It put those skills to work improving the code, gradually. Finally, where gradual improvement wasn't sufficient, I called for targeted rewrites of the most troubled modules. Doing all of this will take time and effort, but the team will have a dramatically improved codebase in a year or so.

Wednesday, August 8, 2012

Fixing a broken codebase, part I

There's an interesting post in Ars Technica by a software engineer who has been hired by a group of scientists to modernize their development practices and fix a big mess of spaghetti-code that has accumulated over 10-20 years. He wants a few fairly simple techniques that will make a big difference.

In my opinion, the engineer should start by instituting some basic modern development practices. They won't fix the broken codebase, but they will keep things from getting worse. They'll also lay a foundation of good practice that will make later improvements possible.

The first phase should institute:

Source control. The scientists need to be able to control who does what to the codebase, and keep a history of changes. A modern source code control system like Perforce or Mercurial does exactly that.
Code review. Every change to the code should be examined by someone other than the original author, and that examiner should be empowered to demand changes until he is satisfied the new code is sound. Many eyes make all bugs shallow, and it starts with code review.
Code ownership. The system is too large for anyone to understand well. It should be divided into portions, based on who understands what best about the system as it is now. Once that's done, the code review policy should mandate that all reviews include the owner of the code being changed.
Daily builds. It should be possible to check out the complete codebase from the source code control system and build (compile and test) it from scratch. This should be done at least daily. It may be best to assign staff to a rotation, with each person being build engineer for a day. In addition to building the current codebase, the build engineer is also responsible for fixing the build if it is broken by identifying the offending code and rolling it back. The build engineer is done when the codebase builds cleanly again.
Bug database. All error reports and feature requests should be stored in a central searchable repository, so the code owners know what is broken and what has been fixed.
Design documents. All new functionality should be documented up front, presented for peer review, and revised until approved. The documents don't need to be very detailed, but they should describe the essential elements of the proposed functionality and why it is being introduced. Also, if the designs change during development, the documents should be updated to reflect what was actually built. And the design documents should be kept in some readily accessible place.
Unit tests. All new functionality should include tests that verify that the code performs as expected. Going forward, these tests will make sure that any additional changes do not break existing functionality. These tests should be run as part of every daily build. Also, all fixed bugs should have unit tests that reproduce the failure, but which run correctly with the fix in place.

These seven policies will keep things from getting worse. If the team did nothing else, they would have their old mess of a codebase, plus a slowly growing layer of superior new code that accesses it. Over time, the average quality of the codebase would slowly rise, as more and more code would be of the new type.

In part II, I'll describe what can be done to improve the older code.