Thursday, December 13, 2007

Avoid needless generality

I've been asking people what Mayfly features would be most useful to them. I suppose I shouldn't be surprised that one of the answers I got was roughly "we use the syntax CONCAT(a, b, c) for string concatenation in MySQL and it would be nice if Mayfly had it instead of making us write it as a || b || c". This fits right into the theory that often the most important features are the silliest (as I already discussed regarding case sensitive table names).

Anyway, having decided to work on CONCAT, my first instinct was to provide hypersonic-compatible stored procedures, which allow the user to define their own CONCAT (well, the two argument version anyway; I'm not sure hypersonic has stored procedures with variable numbers of arguments). I started implementing stored procedures, and it became clear that there were plenty of corner cases (picking which overloaded method to call, error handling, type coercion, and of course the variable number of arguments issue).

So not only did it seem easier to just add a CONCAT built-in, it is closer to what the user really wants. Although defining a concatenate method and registering it with a CREATE ALIAS command isn't hard, there's no particularly compelling reason to make people do this. Figuring out where to put the CREATE ALIAS invocation may also be more of a pain than it sounds, in a system like Mayfly with no static state shared between objects.

There's a good chance I (or some other Mayfly contributor) will eventually get around to implementing the general stored procedures (this feature can be a handy way to make SQL code more portable between Mayfly and non-Mayfly databases, in cases where Mayfly doesn't yet have all the built-in functions that the other database has). But jumping right to the general feature runs a risk of either (a) designing a feature which is supposed to be general, but which isn't really suited for any use case other than the original specific one, which could have been implemented in a way which is much more straightforward for the programmer and the user, and/or (b) designing a general feature which is too complex, too slow, takes too much time to implement, etc, relative to what is really needed. So there's a lot to be said for special-purpose solutions.

Monday, December 03, 2007

Simple Design and Testing Conference

Had the chance to talk about Mayfly at the Simple Design and Testing Conference 2007 conference last weekend. As with other open space conferences I've been to, you don't know quite how a session is going to turn out, but my notes include the sample code which I showed people.

Tuesday, September 18, 2007

Hibernate Annotations and You Ain't Gonna Need It

In the last post I showed a Hibernate demo. I wrote it to show how to hook Mayfly to Hibernate, but it also was my first experience with Hibernate Annotations. I have long been hoping that Hibernate Annotations would ease the pain of hibernate, specifically by eliminating the need to keep looking back and forth between XML mapping files, Java code, and database schemas.

Basically, my experience to date is quite positive. The key thing to note about the demo is all the things which aren't there.

This starts with configuration. There is no hibernate.cfg.xml file, no database username and password, no JDBC URL and driver (I guess some of those would come back if we were talking to a database like MySQL as opposed to Mayfly, but the hibernate.cfg.xml wouldn't).

Next, and more importantly, look at Foo. Here we just need an instance variable for each database column, an annotation @Entity at the start, and an annotation @Id on the primary key. Since the instance variables are named the same as the columns, we don't need to specify a mapping between the two. There's no XML mapping file. I also don't bother with getters and setters. I can always add them later (that's why refactoring browsers have "encapsulate field"), and using them only where they are providing some value avoids a huge amount of clutter. (Or to put it more provocatively, "You Ain't Gonna Need Getters and Setters").

Just getting everything into Foo.java, rather than splitting it between Foo.java and Foo.hbm.xml, is a huge win. You can't just control-click over to the XML file (in a browser like Eclipse), and even if you could you'd be looking back and forth rather than just having everything in one place as with annotations.

So positive impressions confirmed. Definitely plan to give annotations a try on a larger project next time I have a choice.

Demo of Mayfly and Hibernate

In April I wrote a bit about hooking Mayfly to Hibernate. Well, I finally got around to writing a demo showing how to hook it all up in a small self-contained example.

The demo is only a few dozen lines of code, but instead of repeating the whole thing here, I'll point out some interesting bits. Each test creates its own Database object, so there is no shared state to clear out between tests. The test then calls the openConnection on Database to get a JDBC connection, which the FooPersistence class then passes to the openSession method of Hibernate.

Here I configure Hibernate by instantiating an AnnotationConfiguration, tell it the dialect (Hibernate can't figure this out from the connection, because it needs the dialect before the openSession call), and registering Foo as an entity. If you configure Hibernate via XML mapping files, that would work too.

That's pretty much all that is needed. The dialect file is currently part of the demo, so copy it from there.

Monday, July 30, 2007

Profiling mayfly

One of the key goals for an in-memory database for unit testing is that it should be fast. And mayfly is faster than the other databases I've tried for some things, like creating or altering tables. However, it is some 3x slower than hypersonic for inserting rows ("insert into foo(a,b) values(3,4)" kinds of commands). I've been meaning to profile it for a long time now, but I finally got around to it.

First I added the JIP jar to the project and added a profile-test target to build.xml. Because JIP doesn't work with gcj, I ran it under Sun java. JIP writes a text file report, and I worked off that.

The results surprised me. They showed that 70% of the time was being consumed in the lexer and parser. Conventional wisdom is that lexing and parsing just isn't your big bottleneck in a compiler these days, but perhaps that is more for an optimizing compiler which spends more time in code generation and optimization passes. I inlined a few short methods which were bottlenecks (at the cost of a small amount of code duplication, but not so much as to be really shocking, and remember I was careful to only do this for the bottlenecks). This was also a surprise: that method invocation seemed to be such a cost. Given my limited knowledge of java internals, that sort of makes sense, but it kind of seems like a step backwards, in the sense that in the C/Pascal/etc days we made so much effort to make method invocation fast, and tell people they didn't need to make their code ugly to avoid method calls. (At the risk of belaboring the obvious, even if method calls are expensive you still don't need to make your code (very) ugly to avoid them: it is only a handful of invocations which are actually going to make a difference in your run-time, and the profile shows you which ones).

I was also able to streamline the non-parsing part of the code, mostly by taking out some extra steps (for example, transforming a column name to a Column and back to a name more times than needed). Some of that had built up through a series of changes which had left in vestiges of previous ways of doing things. So cleaning this up left the code simpler and clearer, as well as faster.

Other changes, like changing Row to be a HashMap rather than a List, didn't seem to help at all (or even hurt slightly). It has been conceptually a map for some time now, but apparently those linear searches were not particularly more expensive than the many calls to hashCode you get with the map. I guess the fact that we don't expect more than a few dozen columns in a table is responsible.

So what is next when I look at this again? For the lexer, I may have run out of obvious ideas (given that it has duties like tracking the line and column numbers of every token, and I don't see giving up that feature, which provides good error messages). For the parser, there is a lot of expression handling machinery that is involved in parsing the "3" in "insert into foo(a) values(3)". Unless I think of a better way, having the top-level expression parser look for a literal followed by something like "," or ")", and going into a fast-path special case might be worth it. I know that looks like a kluged-up wannabe bottom up parser, but I've been happy enough with recursive descent in other ways, that I have trouble seeing switching back to a parser generator. As for the execution (building up rows, modifying the tables, etc), I'd have to look at the profile more. Although I've seen some hot spots, fixed them (and perhaps created others), I don't have as much of an intuitive feel for what is slow here as I do for the lexing and parsing.

Tuesday, May 29, 2007

Database upgrades: SQL versus code

When last I wrote about database upgrades, one of the design designs was whether to have a database upgrade be a hunk of code (as in rails and other systems I've worked with), or an SQL script (as in the non-automated upgrade scripts that were checked in to MIFOS prior to December's work).

For the first cut (last December), I went with SQL, on the theory that (1) it might be easier for people to understand, especially sysadmins and others who wouldn't necessarily read much of the Java code, (2) many of the cases which came to mind, such as adding a column or adding a table, could be done this way, and (3) if the automated upgrade runs into trouble, it may be easier to run some scripts one at a time, perhaps with changes, rather than trying to mess around with Java (again, for some people). But I did have in mind, even then, that I might need to add code upgrades at some point.

Well, we found a case where the SQL scripts don't work. The MIFOS database contains tables which store things like strings which are displayed to the user. MIFOS ships with a whole bunch of these ("loan", "client", etc), but such strings can also be added by the microfinance institution. In adding them (at least the way our database is currently set up), one needs to assign at least one ID which is not referenced directly from the Java code, but which also is referenced elsewhere in the database. Although there are variants of SQL which have variables and the like (PL/SQL, PL/pgSQL, etc), I don't think MySQL has those kinds of extensions (and trying to turn SQL into a procedural language is somewhat awkward anyway). So the solution will be to implement Java upgrades. I have in mind keeping the ability to do SQL upgrades (that is, each upgrade is either a java upgrade or a SQL one). That is largely to ease the transition (we have about 19 upgrades already, and won't need to convert them over all at once). We'll also see whether writing upgrades in SQL, in those cases where it is possible, ends up being appealing or just a source of confusion.

Thursday, April 26, 2007

Test speedups

The MIFOS tests (ApplicationTestSuite, or pretty much all the junit tests) run in 2600 seconds on my laptop. When you are checking in several times a day, that adds up to a lot of time spent waiting for tests. It also discourages good habits like running the tests frequently to find problems early, checking in frequently, certain kinds of experiments (for example "if I clean up this ugly code, will anything break?") and the like.

There are a few reasons for this. First of all, I'll agree with something that Eric Du or Li Mo (I forget which) said a month or so ago, which was that too many of our tests are integration tests (test many classes) as opposed to unit tests (test a small number of classes). This is very much a case-by-case thing, so I guess I'll just mention the case which came up today: I was testing generateId in AccountBO. And it turns out there is no need to access the database to test this method (see AccountBOTest).

I've also known for a long time that many of the tests get caught in a familiar trap of writing many objects to the database (I need a client, and a client needs a group, and a group needs a center....). Or of creating objects via the database when creating them in-memory would work just fine, be faster, and avoid problems with getting rid of them at the end of the test.

But I was surprised at the speedups around getUserContext(). Someone (sorry, I tried finding out who in the archives but I didn't find it) posted some numbers to the MIFOS mailing list saying that
replacing TestObjectFactory#getUserContext (which involves several database calls) with something faster (TestUtils#makeUser() is the usual choice) cut the run-time of a certain test in half (or something - the number varies from one test to the next and once I convinced myself that the speedup is significant I haven't really been measuring things further).

Unfortunately, globally changing getUserContext to makeUser doesn't quite work - some tests fail that way. But one of my projects lately has been going through tests and changing all those that can be changed.

Getting tests to run fast can take work (especially if you don't have the luxury of writing fast tests from the start of a project). But tests that run slowly don't tend to get run.

Tuesday, April 03, 2007

Mayfly and Hibernate

On MIFOS, we are successfully using Mayfly and Hibernate together, but there are some catches (and some future work - for me and/or other volunteers - in terms of making this work better).

First of all, MIFOS is currently on Hibernate 3.0beta4. I suspect later versions of Hibernate work too, but it would be nice to download Mayfly and Hibernate and try it out on some kind of "hello world" situation.

Next, there's the Hibernate dialect. The one we are using in MIFOS is checked in to MIFOS as MayflyDialect. It would be nice to submit this to Hibernate as a patch. I have been meaning to do this, but just haven't gotten around to it. For a while, I thought it might be changing frequently, but that hasn't been true lately.

Anyway, enough of the boring stuff. The interesting part is whether we can hook up the features of Mayfly which distinguish it from Hypersonic and the rest. For example, let's take the feature of wanting to give each test a fresh database. Suppose we have a static DataStore which contains all the tables and perhaps some data which all tests should start with (in MIFOS, getStandardStore() in DatabaseSetup ). Now, in Hibernate one typically creates a SessionFactory once (not on every test, as it is expensive to make one), and the SessionFactory has the JDBC URL built in. So how do we give a new Database for each test while being able to re-use the SessionFactory? Well, what I've done so far is open my Session with the SessionFactory#openSession(Connection connection) method. I'm probably best off just pointing to an example: testGetAllParameters in ReportsPersistenceTest.

So, anyone found better techniques? This is a good subject for collaboration, not just because it is a way to share the work, but also because everyone's way of setting these things up tends to be slightly different.

Thursday, March 15, 2007

Web security: avoiding HTML injection

A shockingly high percentage of web applications have various kinds of security holes (or just bugs), and one of the biggest causes is failing to quote strings before putting them into an HTML page. See for example LWN: Cross-site scripting attacks.

Most people have now figured out this issue to the point of realizing that your application needs to quote text as it outputs it and providing a function to escape HTML, such as the PHP htmlspecialchars or any number of locally written versions such as the MIFOS xmlEscape in MifosTagUtils.

However, requiring people to remember to call it is error prone. A better approach is that HTML is one thing, and strings are another. So inserting a string into an HTML document will quote it. A few template systems do this (tinytemplate, Amrita, probably a few others). Most HTML-generating libraries (builder, DOM, XmlStreamWriter, etc) do it. If you are using an older template system, like JSP, ASP, PHP, etc, start thinking about how to migrate to something which is secure by default. Here's an article about these issues in Rails (recommending builder instead of rhtml).

Wednesday, February 14, 2007

Recursive descent parsers

Many of us have occasional, rather than frequent, need for parsing a complicated text format (such as a programming language - I'm talking about things more complicated than, say, XML or comma separated values). So often the first step is trying to remember how parser generators work (a parser generator being a tool which takes a grammar and produces a parser, such as byacc, ANTLR, or SableCC). The latest one through this process is Martin Fowler. Through most of my career I've occasionally struggled with ANTLR and yacc, and only recently have come to the conclusion that I'm best off where many of us started: with a hand-written recursive descent parser.

If I lose you with any of the parser jargon, you might need to look at the Dragon Book, but I'm trying to keep it to a minimum.

Anyway, recursive descent is good because:

* You can accomplish everything in your Integrated Development Environment, or pre-existing ant build file. Parser generators, as with other kinds of generated code, impose extra steps every time you modify the grammar (extra manual steps and/or extra build scripting).

* Can accept a wide range of grammars. The only catch is that you need to eliminate left recursion, but in my
experience the solution of a while or do-while loop ends up being even more elegant than the left-recursive grammar you started with.

* Very nice to unit test. Your unit test can call into any production of the grammar. With most parser generators, you can only easily call the top-level production, and then you need to dig around in the tree it returns to find what you are supposed to assert on.

* Easy to understand and debug.

* Gives you complete flexibility to decide how you want to handle trees/actions/etc (often, building up domain/command objects from within the grammar will work well, with no extra tree layer. Depends on the problem I suspect).

* Provides refactoring and navigation tools for the grammar for free. What productions refer to this rule? Just hit "find usages". How do I make a separate rule from part of this rule? Just hit extract method (and you know it will still work - the ANTLR/yacc transformation which looks like extract method is not correctness-preserving in my bitter experience).

Anyway, if you are still with me and want to see how this works, the next step is probably to look at the SQL parser that I wrote this way for Mayfly: the tests (some of them) are at ParserTest and the parser is in Parser.

Saturday, February 10, 2007

Mayfly starts to get options, sooner than I thought it would

OK, a few weeks ago I mused about whether Mayfly was going to need some options (in that case, to set how it handles data types). Well, most of the reasons I imagine for options still remain in the future: "some day we may need/want this". But I did start adding an option to Mayfly. What was the first option? Some profound difference of philosophy about the data model which SQL should present? Maybe the ever-popular "should SQL be more relational?" or the subtle and deep issues around handling of SQL NULL?

No, nothing like that.

It is for case sensitive table names.

The SQL standard, and all databases I know except one, say that table names are case insensitive.

I say, "all databases except one". MIFOS of course is using that one (MySQL). And it is worse than "MySQL makes table names case sensitive". MySQL makes table names case sensitive only if file names are case sensitive (typically Unix). The practical effect of this is that half our team has case insensitive filenames and half has case sensitive ones, and the first group is often accidentally breaking the build (but only for the second group).

I thought a bit about various solutions, and there's a lot to be said about just having Mayfly run in case sensitive mode (on all platforms). So yeah, the CVS version of Mayfly has a method in Database called tableNamesCaseSensitive. Give me a few more days and most of Mayfly should honor it (large parts already do).

Yet another example of how it is hard to anticipate what will really be important (along with familiar cases like prioritizing software features only once you see what users look for and miss when they try to use a prototype, or only doing performance tuning once you have measured where the bottlenecks are).

Monday, January 29, 2007

Follow your nose

Today's tale concerns what happens when we see code smells, and the twisty path we sometimes follow between getting a whiff of something, and reaching the real problem, and/or solution.

It started innocently enough. I was going through the compiler warnings from Eclipse, cleaning them up. Most of these fixes are improvements but not especially big or difficult. For example, getting rid of an unused variable, running the "organize imports" tool, adding missing @Override annotations, etc. But then I saw one which was clearly pointing to something bigger. A simplified version of the code with the warning is:


int FLAT_INTEREST = 1;
int DECLINING_INTEREST = 2;
createLoan(FLAT_INTEREST);


and the warning was because DECLINING_INTEREST was never used. Now, even without going through code archeology I can pretty much guess the sequence of events:

* Developer one creates a method


createLoan(int interestType)
// 1=flat, 2=declining


and various calls to it of the form


createLoan(1);
createLoan(2);


* Developer two sees some of these calls, and in the process of trying to read this code, or write something similar, wants to make the code better say what the parameter means:


int FLAT_INTEREST = 1;
createLoan(FLAT_INTEREST);


She (or the next developer) also figures out that "flat" is as opposed to "declining" and that two means declining. Hence the:


int DECLINING_INTEREST = 2;


All of this is all well and good. Sure, it is easy to find fault with the idea that local variables are a good place for these constants, but the local variable is better than what we had before - createLoan(1) - and we should be willing to improve code clarity one step at a time, rather than trying to solve everything at once.

A next small step could be to turn these local variables into constants in some central place, but I instead looked at the implementation of createLoan. It was something along the lines of:


createLoan(int interestType) {
return new Loan(
InterestType.fromInt(
interestType));
}


That is, this int was getting turned into an
enum anyway. Well, once we see that there is already an enum, we realize most of the job is already done, and we end up with something like the following:


createLoan(InterestType type) { . . . }

createLoan(InterestType.FLAT);


Well, actually there were too many callers to convert everything in one step, but I converted some, and kept the createLoan(int) method as a transitional aid, as I've described previously.

(For those people wanting to see the actual code in MIFOS, the code which had had the warnings was getLoanAccount in CollectionSheetHelperTest, the method which I called createLoan above really is createLoanOffering in TestObjectFactory and the enum is InterestType).

Monday, January 22, 2007

Reviewing code: in-person versus written

It has become a fairly widespread belief (although somewhat less often practiced), that software benefits from having other developers look at it. Such code review is designed to find flaws, seek out forgotten corner cases, improve code style, and any number of other improvements.

One of the key questions is whether to do this in writing or face to face. Most open source projects rely on written feedback, with the most two common forms being (a) send a patch to a mailing list, and get a description of what could be improved (the mozilla code review system is particularly formalized), or (b) check the code in, and other developers look at it there (if not as a deliberate effort to review it, then in the course of their own work). By contrast, a face to face review can be anything from a formal event which sits a bunch of developers in a conference room with the code on a projector, or the back-and-forth of pair programming, or anything in between ("hey, could you look at this?").

As it turns out, this morning I got two unrelated examples of how the written method can be frustrating.

The first story starts a week and a half ago. A new MIFOS developer had checked in some code. Not surprisingly for a new developer, it had a variety of problems both large and small. I sent an email describing some of the ones which seemed most important and/or easiest to find (oh, and reverted the checkin because it broke the build, but fortunately that's much easier with subversion than it would have been with CVS). Today the developer checked in a fixed version. It certainly is improved (I'm pleased to say the unit tests pass). I noticed a variety of smaller things. Now I have to figure out which ones to fix myself and which ones to complain about. In the second case am I going to wait another week and a half to get a reaction? And how do I try to calibrate how good is "good enough" and what we should worry about later? Now, those are valid questions no matter how the suggestions are delivered. And a one-week feedback loop is better than waiting many months, until the developers thought they had something which was "finished". But it is certainly harder to come to an understanding, or build up a team set of practices, if each little point is going to require a number of back-and-forth emails or phone calls.

My second example is a much smaller one, which actually makes it easier to see the point without getting hung up in larger issues of how you manage a project. A well-known columnist wrote an online article which had a typo in one of the examples. I wrote in and said "you have an extra curly brace on the last line; perhaps you copy-pasted it from a few lines up". He wrote back and said "no, the curly should be there" (thinking of the instance a few lines up, which was similar in syntax). This caused me to pause a bit. "Well, I'm pretty sure I'm right, but is it worth worrying about?" and even "how can I express this most clearly, because obviously what I said at first needed too much re-reading for a busy author to do", and so on. Well, I did write the second email ("no, the one at the end" or however I said it) and the author wrote back that I was right and fixed it (no doubt feeling at least a touch sheepish).

Not an especially big deal. But I contrast it with what would have happened if this had happened while pair programming. I would have said "oh, there's an extra curly". Perhaps I would have just taken the keyboard and removed it. But the worst case would be closer to "no, it matches the one up here" "not that curly, this curly (pointing with finger/mouse/etc)" "oh yeah, you're right (reaches for keyboard and deletes it)". Not only would it have gotten fixed faster, but with a minimum of tension (not that pair programming is immune from tension, but that's a subject for another article).

Wednesday, January 10, 2007

Is SQL a strongly typed language?

A strongly typed language can mean different things, but here I look at how SQL stacks up against some of the definitions of strong typing (and to keep it practical, what different SQL implementations do and do not let you do).

* Static typing as opposed to dynamic typing: SQL is statically (strongly) typed in this sense; each column has a type.

* compile-time checks for type constraint violations: Here we need to define the difference between compile-time and run-time in SQL. Basically I would call "compile-time" to be the call to Connection#prepareStatement and "run-time" to be the call to PreparedStatement#executeUpdate. An alternate (probably mostly equivalent) definition is that a "compile-time" check happens even if there are no rows to operate on. A browse through the Mayfly testsuite for "rowsPresent" flags will show cases in which SQL implementations differ on whether a particular check is compile-time or run-time, although the popularity of query optimizers tends to mean that checks happen at compile time (in those cases where I've checked; most of the Mayfly acceptance tests don't distinguish the two cases).

* complex, fine-grained type system: SQL is more fine-grained than systems of the "everything is a string" variety (there are different syntaxes for '5', 5, 5.0, and x'5'), but only recent versions of SQL try to add things like structure/record types.

* omit implicit type conversion. The databases I've tested (Mayfly,MySQL,Postgres,Derby, and Hypersonic) all refuse to read 'foo' as zero (if looking for an integer). All tested databases allow storing an integer into a DECIMAL column (that is, you can say INSERT INTO FOO(x) VALUES(5) and you don't need to say 5.0 even if x is of type DECIMAL). There are also plenty of cases in between (for example, INSERT INTO FOO(x) VALUES(9) into a string column, which works in Hypersonic, MySQL and Postgres, but not Derby or Mayfly).

Anyway, I could go on, either with gory details of what does and does not work (for which I'm better off just having you look at the the Mayfly acceptance tests), or with more general philosophy on types (which dictates what kinds of cases to look for), but I'll cut to the chase: What will be most useful for Mayfly users?

For now, I am generally leaning in the direction of making Mayfly picky - it seems better to catch any errors early (when writing/running tests), rather than later (when trying to deploy on different databases, for example). It is my experience so far that MIFOS doesn't seem to play loose with the type system (which is probably mainly a reflection on Hibernate), so I feel somewhat vindicated in this judgment. As with many things for Mayfly, I realize there are other situations (most notably, if you want to Mayfly-ize an existing application without having to modify it). So the question is whether Mayfly should be opinionated software? I'm generally of the mind that software works best with a clear idea of what it is supposed to do, and that software which tries to accomodate every possible answer to "how should X work?" tends to just get a bunch of poorly-thought-out configuration choices, none of which end up being quite right (for any given situation, or opinion). On the other hand, I'm assuming that Mayfly sooner or later will have some kind of "opinion manager" where you can pick, say, "please enforce the practices considered best by the mayfly developers", or "maximize MySQL compatibility" or even define your own (much like the code formatting configuration in IDEA or Eclipse).

Whether strong typing is a case for options, I don't know, however. Fear says that of course things would break and we need an escape hatch. But I am beginning to wonder whether that breakage is small enough to just wait and see whether this becomes a problem.

Thursday, January 04, 2007

new Integer(5) versus Integer.valueOf(5)

Seems that findbugs warns you if you call new Integer(5) instead of the (new with Java 1.5) Integer.valueOf(5). The point of the latter is that it might return you an existing object rather than creating a new instance.

I'll get back to the Integer.valueOf case, but on the general topic of trying to avoid object creation, there has been a long and largely unhappy history of this in Java. For example, see Allocation is faster than you think, and getting faster.

To summarize the possible problems with object caching and pooling:

* One can accidentally end up sharing a mutable object where the simple design calls for an unshared object. One way to avoid this problem is just to use immutable objects, for example joda-time objects instead of java.util.Date objects. In the Integer.valueOf example, Integer is immutable, so we don't have this problem.

* Pooling almost always complicates the code. Not so much of an issue for Integer.valueOf, in the sense that the standard library has the extra code, we just need to figure out whether to call it.

* Object pools can cause synchronization bottlenecks. There are of course complicated solutions, like separate pools for each thread. In the Integer case, this is Integer.valueOf's problem (and the Sun J2SE 1.5 implementation solves it by just allocating a fixed size pool on startup).

* Object pools tend to increase memory consumption. Often the performance hit of chewing up extra memory (for a long time) will exceed the allocation/deallocation overhead (which may involve a short-lived object, the cheapest kind). Again, in the Integer.valueOf case that's someone else's problem not ours (and in the Sun J2SE 1.5 implementation anyway, the size of the object pool is fixed at JVM startup and won't change based on anything we do).

So having exhausted the usual arguments against object pools, I conclude that it is in fact a good thing to call Integer.valueOf instead of new Integer.

Wednesday, January 03, 2007

Open Source, Free or Libre?

It might be just flamebait to even mention it, but those of us who are writing software which conforms to the Open Source Definition or the Four Freedoms (most often both) sooner or later need to decide whether to call it open source software, free software, or libre software, with the most controversy tending to surround whether people are trying to water down the meaning of open source (for example see What is open source?).

The only reason I bother to write about this is that Martin Fowler, in his Semantic Diffusion article, points out that this is completely par for the course when a concept is getting popular. He has some great examples of this (I am old enough to remember how for a time everything was described as "object-oriented").

So if open source is like the other terms Fowler discusses, there isn't a need to start a big panic. As long as we have a critical mass of people using the word open source to refer to software meeting the open source definition, the odds are good that the meaning won't drift too far and too permanently.