Monday, January 29, 2007

Follow your nose

Today's tale concerns what happens when we see code smells, and the twisty path we sometimes follow between getting a whiff of something, and reaching the real problem, and/or solution.

It started innocently enough. I was going through the compiler warnings from Eclipse, cleaning them up. Most of these fixes are improvements but not especially big or difficult. For example, getting rid of an unused variable, running the "organize imports" tool, adding missing @Override annotations, etc. But then I saw one which was clearly pointing to something bigger. A simplified version of the code with the warning is:


int FLAT_INTEREST = 1;
int DECLINING_INTEREST = 2;
createLoan(FLAT_INTEREST);


and the warning was because DECLINING_INTEREST was never used. Now, even without going through code archeology I can pretty much guess the sequence of events:

* Developer one creates a method


createLoan(int interestType)
// 1=flat, 2=declining


and various calls to it of the form


createLoan(1);
createLoan(2);


* Developer two sees some of these calls, and in the process of trying to read this code, or write something similar, wants to make the code better say what the parameter means:


int FLAT_INTEREST = 1;
createLoan(FLAT_INTEREST);


She (or the next developer) also figures out that "flat" is as opposed to "declining" and that two means declining. Hence the:


int DECLINING_INTEREST = 2;


All of this is all well and good. Sure, it is easy to find fault with the idea that local variables are a good place for these constants, but the local variable is better than what we had before - createLoan(1) - and we should be willing to improve code clarity one step at a time, rather than trying to solve everything at once.

A next small step could be to turn these local variables into constants in some central place, but I instead looked at the implementation of createLoan. It was something along the lines of:


createLoan(int interestType) {
return new Loan(
InterestType.fromInt(
interestType));
}


That is, this int was getting turned into an
enum anyway. Well, once we see that there is already an enum, we realize most of the job is already done, and we end up with something like the following:


createLoan(InterestType type) { . . . }

createLoan(InterestType.FLAT);


Well, actually there were too many callers to convert everything in one step, but I converted some, and kept the createLoan(int) method as a transitional aid, as I've described previously.

(For those people wanting to see the actual code in MIFOS, the code which had had the warnings was getLoanAccount in CollectionSheetHelperTest, the method which I called createLoan above really is createLoanOffering in TestObjectFactory and the enum is InterestType).

Monday, January 22, 2007

Reviewing code: in-person versus written

It has become a fairly widespread belief (although somewhat less often practiced), that software benefits from having other developers look at it. Such code review is designed to find flaws, seek out forgotten corner cases, improve code style, and any number of other improvements.

One of the key questions is whether to do this in writing or face to face. Most open source projects rely on written feedback, with the most two common forms being (a) send a patch to a mailing list, and get a description of what could be improved (the mozilla code review system is particularly formalized), or (b) check the code in, and other developers look at it there (if not as a deliberate effort to review it, then in the course of their own work). By contrast, a face to face review can be anything from a formal event which sits a bunch of developers in a conference room with the code on a projector, or the back-and-forth of pair programming, or anything in between ("hey, could you look at this?").

As it turns out, this morning I got two unrelated examples of how the written method can be frustrating.

The first story starts a week and a half ago. A new MIFOS developer had checked in some code. Not surprisingly for a new developer, it had a variety of problems both large and small. I sent an email describing some of the ones which seemed most important and/or easiest to find (oh, and reverted the checkin because it broke the build, but fortunately that's much easier with subversion than it would have been with CVS). Today the developer checked in a fixed version. It certainly is improved (I'm pleased to say the unit tests pass). I noticed a variety of smaller things. Now I have to figure out which ones to fix myself and which ones to complain about. In the second case am I going to wait another week and a half to get a reaction? And how do I try to calibrate how good is "good enough" and what we should worry about later? Now, those are valid questions no matter how the suggestions are delivered. And a one-week feedback loop is better than waiting many months, until the developers thought they had something which was "finished". But it is certainly harder to come to an understanding, or build up a team set of practices, if each little point is going to require a number of back-and-forth emails or phone calls.

My second example is a much smaller one, which actually makes it easier to see the point without getting hung up in larger issues of how you manage a project. A well-known columnist wrote an online article which had a typo in one of the examples. I wrote in and said "you have an extra curly brace on the last line; perhaps you copy-pasted it from a few lines up". He wrote back and said "no, the curly should be there" (thinking of the instance a few lines up, which was similar in syntax). This caused me to pause a bit. "Well, I'm pretty sure I'm right, but is it worth worrying about?" and even "how can I express this most clearly, because obviously what I said at first needed too much re-reading for a busy author to do", and so on. Well, I did write the second email ("no, the one at the end" or however I said it) and the author wrote back that I was right and fixed it (no doubt feeling at least a touch sheepish).

Not an especially big deal. But I contrast it with what would have happened if this had happened while pair programming. I would have said "oh, there's an extra curly". Perhaps I would have just taken the keyboard and removed it. But the worst case would be closer to "no, it matches the one up here" "not that curly, this curly (pointing with finger/mouse/etc)" "oh yeah, you're right (reaches for keyboard and deletes it)". Not only would it have gotten fixed faster, but with a minimum of tension (not that pair programming is immune from tension, but that's a subject for another article).

Wednesday, January 10, 2007

Is SQL a strongly typed language?

A strongly typed language can mean different things, but here I look at how SQL stacks up against some of the definitions of strong typing (and to keep it practical, what different SQL implementations do and do not let you do).

* Static typing as opposed to dynamic typing: SQL is statically (strongly) typed in this sense; each column has a type.

* compile-time checks for type constraint violations: Here we need to define the difference between compile-time and run-time in SQL. Basically I would call "compile-time" to be the call to Connection#prepareStatement and "run-time" to be the call to PreparedStatement#executeUpdate. An alternate (probably mostly equivalent) definition is that a "compile-time" check happens even if there are no rows to operate on. A browse through the Mayfly testsuite for "rowsPresent" flags will show cases in which SQL implementations differ on whether a particular check is compile-time or run-time, although the popularity of query optimizers tends to mean that checks happen at compile time (in those cases where I've checked; most of the Mayfly acceptance tests don't distinguish the two cases).

* complex, fine-grained type system: SQL is more fine-grained than systems of the "everything is a string" variety (there are different syntaxes for '5', 5, 5.0, and x'5'), but only recent versions of SQL try to add things like structure/record types.

* omit implicit type conversion. The databases I've tested (Mayfly,MySQL,Postgres,Derby, and Hypersonic) all refuse to read 'foo' as zero (if looking for an integer). All tested databases allow storing an integer into a DECIMAL column (that is, you can say INSERT INTO FOO(x) VALUES(5) and you don't need to say 5.0 even if x is of type DECIMAL). There are also plenty of cases in between (for example, INSERT INTO FOO(x) VALUES(9) into a string column, which works in Hypersonic, MySQL and Postgres, but not Derby or Mayfly).

Anyway, I could go on, either with gory details of what does and does not work (for which I'm better off just having you look at the the Mayfly acceptance tests), or with more general philosophy on types (which dictates what kinds of cases to look for), but I'll cut to the chase: What will be most useful for Mayfly users?

For now, I am generally leaning in the direction of making Mayfly picky - it seems better to catch any errors early (when writing/running tests), rather than later (when trying to deploy on different databases, for example). It is my experience so far that MIFOS doesn't seem to play loose with the type system (which is probably mainly a reflection on Hibernate), so I feel somewhat vindicated in this judgment. As with many things for Mayfly, I realize there are other situations (most notably, if you want to Mayfly-ize an existing application without having to modify it). So the question is whether Mayfly should be opinionated software? I'm generally of the mind that software works best with a clear idea of what it is supposed to do, and that software which tries to accomodate every possible answer to "how should X work?" tends to just get a bunch of poorly-thought-out configuration choices, none of which end up being quite right (for any given situation, or opinion). On the other hand, I'm assuming that Mayfly sooner or later will have some kind of "opinion manager" where you can pick, say, "please enforce the practices considered best by the mayfly developers", or "maximize MySQL compatibility" or even define your own (much like the code formatting configuration in IDEA or Eclipse).

Whether strong typing is a case for options, I don't know, however. Fear says that of course things would break and we need an escape hatch. But I am beginning to wonder whether that breakage is small enough to just wait and see whether this becomes a problem.

Thursday, January 04, 2007

new Integer(5) versus Integer.valueOf(5)

Seems that findbugs warns you if you call new Integer(5) instead of the (new with Java 1.5) Integer.valueOf(5). The point of the latter is that it might return you an existing object rather than creating a new instance.

I'll get back to the Integer.valueOf case, but on the general topic of trying to avoid object creation, there has been a long and largely unhappy history of this in Java. For example, see Allocation is faster than you think, and getting faster.

To summarize the possible problems with object caching and pooling:

* One can accidentally end up sharing a mutable object where the simple design calls for an unshared object. One way to avoid this problem is just to use immutable objects, for example joda-time objects instead of java.util.Date objects. In the Integer.valueOf example, Integer is immutable, so we don't have this problem.

* Pooling almost always complicates the code. Not so much of an issue for Integer.valueOf, in the sense that the standard library has the extra code, we just need to figure out whether to call it.

* Object pools can cause synchronization bottlenecks. There are of course complicated solutions, like separate pools for each thread. In the Integer case, this is Integer.valueOf's problem (and the Sun J2SE 1.5 implementation solves it by just allocating a fixed size pool on startup).

* Object pools tend to increase memory consumption. Often the performance hit of chewing up extra memory (for a long time) will exceed the allocation/deallocation overhead (which may involve a short-lived object, the cheapest kind). Again, in the Integer.valueOf case that's someone else's problem not ours (and in the Sun J2SE 1.5 implementation anyway, the size of the object pool is fixed at JVM startup and won't change based on anything we do).

So having exhausted the usual arguments against object pools, I conclude that it is in fact a good thing to call Integer.valueOf instead of new Integer.

Wednesday, January 03, 2007

Open Source, Free or Libre?

It might be just flamebait to even mention it, but those of us who are writing software which conforms to the Open Source Definition or the Four Freedoms (most often both) sooner or later need to decide whether to call it open source software, free software, or libre software, with the most controversy tending to surround whether people are trying to water down the meaning of open source (for example see What is open source?).

The only reason I bother to write about this is that Martin Fowler, in his Semantic Diffusion article, points out that this is completely par for the course when a concept is getting popular. He has some great examples of this (I am old enough to remember how for a time everything was described as "object-oriented").

So if open source is like the other terms Fowler discusses, there isn't a need to start a big panic. As long as we have a critical mass of people using the word open source to refer to software meeting the open source definition, the odds are good that the meaning won't drift too far and too permanently.