Saturday, November 23, 2013

Securing package distribution with TUF

Suppose you are downloading a new fun game for your computer and you want to know whether it is going to do what it claims (clicking on cows, let's say) or whether it is going to send all your data (credit card numbers you type, let's say) to a shadowy cabal in Martha's Vineyard or Napa Valley or whereever shadowy cabals are found these days. For the sake of argument, let's say that you have heard good things about the a (hypothetical) open source project called FreeCowClicker2 written by Ilia Bogomips and you want to try it out.

Well, in some cases the authors of FreeCowClicker2 might run a download site and you might get it there, but most of the time you'll probably be getting it from a package repository, such as a linux distribution, a programming-language-specific repository such as CPAN (perl), PyPI (python), rubygems (ruby), or something like How do I know I'm getting the package I want, if (a) I am connecting to a potentially dodgy WiFi access point or there is some other way in which the shadowy cabal has gotten into my network, or (b) one of the servers involved in serving up the files, or mirroring them, is under the control of our shadowy cabal?

If you are a little bit familiar with this stuff, you are probably saying "signed packages", as found in for example Fedora or Debian. And that indeed is what I'm getting at, specifically TUF (The Update Framework). TUF aims to be usable by any package repository, but the most effort to date has been to using it for PyPI.

As part of Square's hack week which just concluded, a number of us looked into using TUF with rubygems, and wrote some code to that end. Hopefully that code helps clarify what this is all about, and there is a fair bit of documentation on the TUF site, so I'll just mention a few of the high points and interesting details:

  • TUF can upgrade its keys. Your package installer might find there are new keys (signed by the old keys) and switch to them.
  • There are multiple keys for different things. Anyone who contributes a package can have a key which is just good for signing that one package, there are separate keys which are used to say what packages were released as of a given time, and there are keys which are just used to sign the frequently used keys. Some of the keys can be kept offline, and only used maybe a few times a year.
  • TUF is fairly easy to work with. The public keys and signatures and such are kept in JSON, which means you can parse them with, say, ruby or jq.
  • One little example of something they thought of: when you get a reference to another TUF file you also get a signed length. That way an attacker can't substitute a multi-gigabyte file and cause a denial of service by tying up your network or computer. You just need to download as many bytes as the signature told you to, and can quit after that.

There's plenty left to do to finish the job of getting rubygems to use TUF, so go look at the pull request and start pitching in if you feel so inclined. Based on a week of looking at it, TUF does look like a solid basis for a more secure rubygems which preserves all the cool things about rubygems like letting people release gems often, letting anyone author a gem, etc. Likewise, it also seems promising for other package repositories.

Friday, July 20, 2012

Parsing large XML files

Every once in a while, we need to parse large XML files. Here "large" means that the file won't fit in memory, so we can't just suck it in using nokogiri (or our favorite in-memory XML library). SAX is fine as a low level parser to hand tell you where the tags start and end, but trying to do any significant processing will turn into spaghetti unless you have a bit of a framework. The last time I visited this topic, I ended up writing a library, saxophone, which invoked callbacks when it encountered certain named tags. Saxophone is sitting in an obscure git repository; I could put it up as a gem if someone wants it; the big question is whether there is something better out there. The wasabi WSDL parser has been trying their own mini-framework (partially special purpose) described at this issue. But probably the best I've seen so far is sax-machine (specifically, the lazy option thereto). I haven't spent much time playing with it (at least not yet), but it seems like a better starting point than starting from scratch with a new gem. If you do end up writing code directly on top of SAX, just remember this: keep a stack of start tags and end tags. Following this idiom might cut down on the buggy spaghetti that I've seen when I've tried to do without something like saxophone or sax-machine. Update: I fixed the above link to the wasabi issue, which had changed. Not sure how long-lived any of these links are going to be, but here's another one: lib/wasabi/sax_parser.rb from the sax-parser branch. The key is the stack (pushed on start tag, popped on end tag) and the matchers.

Wednesday, December 07, 2011


Sometimes you pick a programming idiom because it is what you are familiar with, because you think it is expected, or because it expresses clearly what the code you are writing is trying to do. Other times, it is just too hard to resist. Lately at work at least two of us have seen .count.count in our rails3 code, and at first were sure it must be a typo. The real story is more fun than that, see the nerdfeed blog for more.