Tuesday, December 19, 2006

Database upgrades

Well, my project for the last few weeks has been database upgrades. This concept has been popularized by rails and to a certain extent the Database refactoring book, but it is something that just about any project could do, without a whole lot of mechanism. It only took us (myself and Stelligent) a few days to implement an automated upgrade scheme.

I guess the best way to describe the design is just to point to the Wiki page.

Some of the key decisions were/are:

(1) sql vs java. Is an upgrade script just a .sql file, or is it written in a real programming language like Java/ruby/perl/etc? We went with the .sql approach because it seems like the Simplest Thing Which Could Possibly work, and it seems like introducing a programming language could get needlessly complicated and/or hard to understand. Having said that, I do realize that we may someday hit a situation where it is really awkward to process data via SQL. But I'd rather cross that bridge when we come to it, given how many upgrades so far have been very simple (insert default value into new column, create new table with no rows, etc).

(2) Whether to provide upgrade scripts for every checkin which changes the database schema, or on coarser boundaries like releases? The latter is
sufficient if the goal of this exercise is merely to allow production users to upgrade. Requiring developers to blow away their test data can be done, but the real question I have there is how is a developer supposed to know when they need to do this. I've been on projects where a significant amount of the chatter in the team room is "is anyone else seeing a failure x?" "rebuild your database". This doesn't really seem great even if everyone is in the same room, and on a distributed project like MIFOS, it seems to me quite important to at least detect an out of date database on developer machines (and once you've done that, might as well just do the upgrades).

(3) Whether to provide downgrade scripts. The purpose, I guess, is to let someone try out a new version of the software knowing they can always go back if it didn't work out. Given all we are trying to do in terms of having tests, continuous integration, etc, to make upgrades go smoothly without needing to roll back, I'm not sure how much effort to put into downgrades. On the other hand, it may be hubris to think downgrades will never be needed.

(4) What tests to build. The interesting tests in the MIFOS case are in LatestTest. There are tests of the upgrade scripts (checking against latest-*.sql), the downgrade scripts (checking each one undoes the corresponding upgrade script). We're also thinking about some tests which would test that the upgrade scripts properly upgrade existing user-supplied data (as opposed to just schemas and data supplied with MIFOS). But this post is getting long already, so I'll leave that for another time (or for people to ask about).

Thursday, December 14, 2006

MIFOS programmer intern position in Washington, DC

If you are a programmer looking for an internship in Washington, DC, check out Software Development Intern, Mifos. Much of the time would be spent pairing with me and/or working on some of the things I write about here.

SQL dumps and topological sorts

A few weeks ago I wrote a SQL dump utility for Mayfly. That is, take a database and dump it out as SQL statements, capable of re-creating the database. I played around with it for a while, thought I had it in a fairly complete state, and didn't think about it much until I was ready to use it (for some database upgrade work which I should write about too, but not now).

Turns out I forgot a key step: try out the dumper on some reasonably large and/or real world data set. In this case, the data set is the MIFOS SQL schema (and master data checked in, which has various records which the application itself needs). And the test is a fairly simple, automated one: dump out the data and try to reload it. This test worked fine with some small SQL scripts I wrote for the Mayfly test suite, but when I dumped the MIFOS schema/data, it failed because the foreign keys were out of order. That is, in the schema, a foreign key has to refer to a table which already exists (constraining the order in which we dump CREATE TABLE statements), and a row has to refer to a row which already exists (constraining the order in which we dump INSERT statements).

Sounds simple, right? Just sort the tables, using a java.util.Comparator which returns -1 or 1 based on whether there is a foreign key between the two tables, right? I'm slightly embarrassed to admit that I actually implemented this, found that it worked on a few small test cases, noticed it failed on MIFOS ("gee, that's funny, I'll need to look into that"), and proceeded to other problems (my immediate need did not demand that the SQL file be reloadable into a database, just that it represent what is in the database).

So here's an exercise for the reader: what was wrong with my implementation? (feel free to answer in the comments section if you want).

Anyway, after some research on Wikipedia, I realized that what I needed was a topological sort (and, fortunately, that I didn't also need something like the Floyd-Warshall algorithm).

Are there libre Java topological sort libraries out there? I didn't look very hard, but didn't see any (on apache commons, I saw something in the "commons sandbox", but it wasn't clear that it was ever finished, or that it even is still there).

I took an hour or two to understand the topological sort algorithm in Wikipedia, and another two hours or so to implement it, and it will probably be another hour or two to have the SQL dump utility call it, so I don't really have a pressing need for a library any more. That is, unless I find maintaining my implementation ends up being a lot of work, which could happen although I suspect I'm over the hard part in just getting to this point. It is one of those algorithms which seems scary until you understand it, but pretty simple thereafter. It's only a hundred or so lines of code and about twice that for tests (depending of course on what you count as the topological sort code and what you count just as setting up various things to send into it).

I was a little bit surprised that a topological sort was so unfamiliar to me. I guess this just means that we don't need it all that often, although when we do need it, we need it bad. By "need it bad", I mean my suspicion is that no amount of ad hoc algorithm building of the sort of "well, these two tables are out of order, let's try switching them" is likely to converge on a working, fast-enough algorithm. Not that I tried the ad hoc approach (unless you count my Comparator misstep). I wouldn't say this example undermines my faith in agile practices like evoluationary design or "running code speaks louder than words" (see for example the line "Too much talk, not enough code. Type!" in the bowling score pair programming example). But it does indicate that there are times when it pays off to stop coding long enough to go read up on how others have solved similar problems.

Wednesday, November 29, 2006

Needles, haystacks, and log4j

There was an interesting bug report that came in to the MIFOS mailing list recently.

Someone was posting that they couldn't start MIFOS (that's not the interesting part). The interesting part was: "then I am Getting so many errors, List of the Errors is as follows." and lots of log4j output (in fact, too much for the mailing list archive program to show it all, so you'll need to take my word for what was there).

Most of the log4j output which was INFO messages which meant nothing at all was wrong. The trouble started with a WARN which started "org.hibernate.cfg.SettingsFactory -
Could not obtain connection metadata"
and proceeded with a stacktrace. Then another 19 or so INFO messages (not related to the error, as far as I can tell). Then another WARN, this one even more cryptic than the last: "org.hibernate.util.JDBCExceptionReporter - SQL Error:
1045, SQLState: 28000". Then finally an ERROR which fairly directly said what was wrong: "org.hibernate.util.JDBCExceptionReporter - Access
denied for user 'root'@'localhost' (using password:
YES)".

In other words, this was a simple problem (the database user and password that had been supplied to MIFOS were not set up in MySQL) but the actual error message was buried in some 1500 lines of red herrings.

It's no wonder that software gets a reputation for being hard to install/configure/run, when tracking down the simple problems involves this level of looking for a needle in a haystack.

For MIFOS, the low-hanging fruit seems pretty clear: make sure the default log4j logging level is set to WARN (in fact, I would have changed this already, except I couldn't find where it is being set - which is another good log4j rant but one for another time). Then all those INFO messages wouldn't be there. Bonus points would be given for: (1) reporting the real error once instead of 3 times (probably best done within Hibernate), and (2) making it so that one can go to localhost:8080/mifos (that is, the URL which would have had the application, had it started) and see an error message (or at least a hint - like "application failed to start - see xxx for detail").

Thursday, November 16, 2006

SQL DELETE of all rows not as easy as you'd think

So clearing out the contents of a table from an SQL database is a relatively common operation. Tests might do it to start from nothing, or MIFOS's own testdbinsertionscript.sql does it so that the tests can have some sample data which is a bit different than what we supply for production.

Sounds simple, right? Just execute:


DELETE FROM TABLENAME


And in fact that works most of the time.

But there is a fairly common case in which things
might not be quite that simple. Suppose that each row of the table points to a parent. For example:


create table foo(id integer primary key,
name varchar(255),
parent integer,
foreign key(parent) references foo(id)
);
insert into foo values(1, 'Eve', null);
insert into foo values(10, 'Seth', 1);
insert into foo values(101, 'Enos', 10);


(For the non-SQL-aware, the FOREIGN KEY stuff just means what I said in words - that the parent points to another record in the table).

Now in this case suppose we try to delete a row:


delete from foo where id = 1


This should fail, and does, because to delete the record for Eve would leave the record for Seth pointing to nothing.

But now try:


delete from foo


If the database deleted the records one at a time, and applied all the usual rules, then it might fail (depending on in what order the database processes the records). In fact that is what you see in MySQL, and the developers of MySQL have offered a way around this by adding an ORDER BY to their DELETE statement.

Hypersonic is much like MySQL, except it seems not to honor any ORDER BY.

Postgres and Derby, on the other hand are smarter: they just will delete all the rows (I don't know whether they look at foreign keys as a group rather than row-by-row, or what, but the observation is that the delete Just Works).

Right now Mayfly is like Hypersonic/MySQL, without the chance to specify ORDER BY. I guess the Postgres/Derby behavior is the right one (although I'll have to think about how to implement it - if it were a simple change I would have just done it, rather than all this whining). Somehow ORDER BY doesn't feel right to me. It seems to be based too much on a model of how delete is to operate, and not enough on what result delete is supposed to produce.

For now, I worked around this in MIFOS by first clearing the parent pointers and then deleting the rows:


update foo set parent = null
delete from foo


That could get complicated if one were not allowing NULL in this column. But for this situation, it seems like a pretty painless workaround (this particular test data setup isn't a performance bottleneck, so there is no need to worry about that).

Tuesday, November 14, 2006

Press mention in newsforge

I make no attempt here to log all the mentions of Grameen or even MIFOS (especially since the Nobel prize), but here's one in newsforge: Microfinance and open source: natural partners.

Newsforge is one of the better open source news sites. I mean, no one can match LWN's Weekly Edition for relevance and good writing, but most of the time that I click on a newsforge article, I end up informed. Just to pick another example from today, their article about the reaction to Sun's Java plans is spot-on. It points to some relevant mailing list threads and avoids getting caught up in the hype.

Friday, November 10, 2006

Testing equals and hashCode

This isn't a post about whether it is a good idea to implement equals and hashCode in all your classes, and if so how (check all fields, check some kind of identifier, check fields except the boring ones, etc).

No, I'm assuming that you have decided to implement equals and hashCode, either because you like working that way, or because you are using a package like Hibernate which encourages/requires it.

So now the question is: being good test-driven developers that we are, how do we write the tests for our equals and hashCode methods? Many of us have probably read the javadoc for Object#equals (the so-called equals contract), and started out
writing things like:

assertTrue(one.equals(two));
assertTrue(two.equals(one));
assertFalse(one.equals(null));

etc.

And that's about right. But it seems like this is a framework waiting to happen (well, framework is probably too grandiose a word for something which probably doesn't need to be more than a hundred or so lines of code and just affects tests for equals and hashCode, but hey, people have called things frameworks for less).

Are there any good ones out there in Apache commons or the other usual places? I've seen some really bad ones, but generally have just ended up writing them myself. I'm enclosing the one I'm currently using in both Mayfly and MIFOS.

The one thing it doesn't do super-well is test transitivity. You can give it a bunch of things which should all be equals to each other, and it tests that they all are, but it doesn't do any transitivity tests for not-equals. I think it is pretty clear how to fix that: instead of just passing in a bunch A of things equals to each other, pass in several bunches: A, B, and C. Each object within A should be equals to the others in A, but none of the ones in B and C. Likewise for B and C (the mathematically experienced of you will recognized these "bunches" as equivalence classes). In fact, I started to implement this today, and I got a bit hung up on whether it reads as nicely as what I have now. Somehow, passing in Object[][] { new Object[] { a1,a2}} just seemed like too many levels of [] and {} and such. I don't know if my concern is justified.


public static void assertAllEqual(Object[] objects) {
/**
* The point of checking each pair is to make sure that equals is
* transitive per the contract of {@link Object#equals(java.lang.Object)}.
*/
for (int i = 0; i < objects.length; i++) {
Assert.assertFalse(objects[i].equals(null));
for (int j = 0; j < objects.length; j++) {
assertIsEqual(objects[i], objects[j]);
}
}
}

public static void assertIsEqual(Object one, Object two) {
Assert.assertTrue(one.equals(two));
Assert.assertTrue(two.equals(one));
Assert.assertEquals(one.hashCode(), two.hashCode());
}

public static void assertIsNotEqual(Object one, Object two) {
assertReflexiveAndNull(one);
assertReflexiveAndNull(two);
Assert.assertFalse(one.equals(two));
Assert.assertFalse(two.equals(one));
}

public static void assertReflexiveAndNull(Object object) {
Assert.assertTrue(object.equals(object));
Assert.assertFalse(object.equals(null));
}

Wednesday, November 08, 2006

Mayfly SQL Dump produces SQL that Mayfly can read

My project of the last few days has been to write a dump utility so that Mayfly can output a database in SQL (similar to mysqldump and similar tools provided with most databases).

Mayfly's dumper can now output CREATE TABLE statements with all of Mayfly's current data types, and likewise INSERT statements for the rows.

So the milestone is that I can now take the standard MIFOS data from the unit tests (DatabaseSetup#getStandardStore()), give it to the dumper, load that dump file back into Mayfly, dump it again, and the first and second dumps will have identical contents.

Now, if the dump just leaves out parts of the data/metadata (as it currently does with constraints, auto-increment values, and binary columns), then this test won't complain (the first dump will omit something, and the reload will just load something different). But it still seems like the dumper might not be too far from finished: this test at least implies that the dumper doesn't blow up on anything in the MIFOS data/metadata, and doesn't generate any invalid SQL.

Monday, November 06, 2006

First MIFOS unit tests pass with Mayfly

(This was actually from 2 Nov 2006)

So, one of my main projects lately (last 2 months or so) has been getting the MIFOS unit tests to work with an in-memory database. For a while the task was just to get Mayfly to read the MIFOS SQL files (mifosdbcreationscript.sql and mifosmasterdata.sql) - I could measure progress by how many lines into the script before Mayfly gave an error.

After that, the task was to get Hibernate to talk to Mayfly. This was considered successful when a simple Hibernate call could get an object from data which had been in the database (I later found out that there were other corners of Hibernate I needed to worry about).

Then there was running a MIFOS test (one of the existing unit tests, which have been running with MySQL until now). I started with FeePersistenceTest (chosen more or less at random).

First step was making it through the initialization code in TestCaseInitializer. This mostly just worked, but there was one interesting surprise. There was a join of 80 rows by 500 rows by 500 rows (written with implicit joins and WHERE, not INNER JOIN and ON), and that was too much for the naive "build the cartesian product first and then start applying WHERE conditions" algorithm that Mayfly had. Now, one can argue that a unit test should be whittling down its dataset, and that might be how we end up going, but one of my ideas for MIFOS and Mayfly is to see how far we can get while avoiding some of those familiar unit testing slimmings. (As another example, if I run into a piece of MySQL-specific SQL, I tend to rewrite the SQL to be portable, or add the feature to Mayfly, rather than build an abstraction layer which lets MIFOS generate different flavours of SQL). Anyway, back to joins. I built a simple query optimizer which got me past this.

Oh, yeah, and there was all the ALTER TABLE work I did so Mayfly could execute some/all of the Iteration*.sql files (as it turns out, I'm not sure I needed this quite yet, but I should soon).

So next various things failed as FeePersistenceTest created its test objects and such. I've been fixing those one at a time. In fact, I've
been beginning to worry about how much work might be left, given that I don't have any particularly good way to estimate how many of these features remain. Well, this morning I saw an odd symptom - instead of the usual 6 failing tests, I only saw 4. That's right, 2 had passed. Looking at what had failed, I saw 2 easy features to implement on
my laptop at lunch, and once I checked that in, all 6 were passing!

Now, when I tried running CenterBOTest (second test picked at random), there were a whole new set of failures. But still, to be over the FeePersistenceTest hump is quite exciting.

Saturday, November 04, 2006

Enums are a good thing

Yesterday I dove into the tests looking for something to clean up. I started with the NonUniqueObjectException we're getting in one test (and swallowing), but in the process of trying to look around to see what the two objects might be that make uniqueness not exist, I found other code smells.

So I'm looking at code which (simplified) looked something like:

createClient(Short.valueOf("3"), "A test client")

The pain involved in using short instead of int is the first glaring thing, but actually what that really should have been was an enum:

createClient(CustomerStatus.CLIENT_ACTIVE, "A test client")

If those two things look basically the same to you, I'd suggest thinking a little harder about where you are spending your brain power while reading/maintaining this code. Sure, once you've come up to speed you can probably remember that "3" here means active, but shouldn't you have the computer keep track of that? And if you are just learning this code, or forgot that detail, then "3" is totally mystifying - in fact what got me onto this tangent is that I was wondering whether it was an ID which, duplicated, had something to do with the Hibernate non-unique exception.

One more detail: how did I fix this? The createClient method had about 150 callers (fortunately with good test coverage). So I didn't want to fix them all at once. I created my new createClient:

createClient(CustomerStatus status, String name)

and had it call the old one (or maybe vice-versa, the point is having one call the other rather than a copy-paste, since it is so easy to look up the enum from the short, or vice-versa):

createClient(short status, String name)

I then started fixing up callers. I think I got to about 100 before I got bored. So I checked in the 100, and I can get to the other 50 some other day.

I suppose I could also turn this into a rant about how helpful Java's strong typing is, because with the enum I know (as I'm typing, thanks to Eclipse, not just at run-time) what that first argument to createClient is. But that's a debate which goes back at least to the 1960's. I'll just say that since we are paying the price (extra syntax, mainly) for compile-time types, we should get the payoff.