Monday, July 30, 2007

Profiling mayfly

One of the key goals for an in-memory database for unit testing is that it should be fast. And mayfly is faster than the other databases I've tried for some things, like creating or altering tables. However, it is some 3x slower than hypersonic for inserting rows ("insert into foo(a,b) values(3,4)" kinds of commands). I've been meaning to profile it for a long time now, but I finally got around to it.

First I added the JIP jar to the project and added a profile-test target to build.xml. Because JIP doesn't work with gcj, I ran it under Sun java. JIP writes a text file report, and I worked off that.

The results surprised me. They showed that 70% of the time was being consumed in the lexer and parser. Conventional wisdom is that lexing and parsing just isn't your big bottleneck in a compiler these days, but perhaps that is more for an optimizing compiler which spends more time in code generation and optimization passes. I inlined a few short methods which were bottlenecks (at the cost of a small amount of code duplication, but not so much as to be really shocking, and remember I was careful to only do this for the bottlenecks). This was also a surprise: that method invocation seemed to be such a cost. Given my limited knowledge of java internals, that sort of makes sense, but it kind of seems like a step backwards, in the sense that in the C/Pascal/etc days we made so much effort to make method invocation fast, and tell people they didn't need to make their code ugly to avoid method calls. (At the risk of belaboring the obvious, even if method calls are expensive you still don't need to make your code (very) ugly to avoid them: it is only a handful of invocations which are actually going to make a difference in your run-time, and the profile shows you which ones).

I was also able to streamline the non-parsing part of the code, mostly by taking out some extra steps (for example, transforming a column name to a Column and back to a name more times than needed). Some of that had built up through a series of changes which had left in vestiges of previous ways of doing things. So cleaning this up left the code simpler and clearer, as well as faster.

Other changes, like changing Row to be a HashMap rather than a List, didn't seem to help at all (or even hurt slightly). It has been conceptually a map for some time now, but apparently those linear searches were not particularly more expensive than the many calls to hashCode you get with the map. I guess the fact that we don't expect more than a few dozen columns in a table is responsible.

So what is next when I look at this again? For the lexer, I may have run out of obvious ideas (given that it has duties like tracking the line and column numbers of every token, and I don't see giving up that feature, which provides good error messages). For the parser, there is a lot of expression handling machinery that is involved in parsing the "3" in "insert into foo(a) values(3)". Unless I think of a better way, having the top-level expression parser look for a literal followed by something like "," or ")", and going into a fast-path special case might be worth it. I know that looks like a kluged-up wannabe bottom up parser, but I've been happy enough with recursive descent in other ways, that I have trouble seeing switching back to a parser generator. As for the execution (building up rows, modifying the tables, etc), I'd have to look at the profile more. Although I've seen some hot spots, fixed them (and perhaps created others), I don't have as much of an intuitive feel for what is slow here as I do for the lexing and parsing.


Brian said...

It so happens that I was also doing some profiling recently. I also started with JIP and I believe what you're seeing is an artifact of profiling. JIP instruments the start and end of each method, so there is artificial overhead to each method call that makes inlining seem more effective than it really is. I've found it's important to take anything you see in the profiler with a grain of salt and to verify any optimizations with a benchmark that runs with profiling turned off.

I had much better luck with the NetBeans profiler; it calibrates itself for method call overhead on your machine, and it instruments JDK method calls, which might be where the time is taken up. Figuring out how to set up a NetBeans project and switching IDE's whenever you want to profile is kind of annoying, but the results are more accurate.

Also, I'd recommend looking into reading a large block of characters into a CharBuffer and iterating over the underlying array with C-style code; I ended up with a really fast scanner that way.

Jim Kingdon said...

I'm glad you mentioned a per-method-call overhead to JIP, as I should have thought of that (but didn't). I did do some verifying based on total run times (I think with profiling turned off, but this was two months ago), but probably not as much as if I had been aware of this.

Next time I take this up, I'll need to revisit some of the other questions accordingly (such as whether lexing time was really dominated by building up token objects, or whether the bottleneck was reading characters after all).