Thursday, July 28, 2011

Using active record in rails migrations

Most rails developers have probably sooner or later run into the problem: if your migrations refer to active record classes and the active record classes change out from under the migration, old migrations won't work as desired any more. Whether this is a big problem or a minor annoyance depends on how often you run migrations, how many databases you have (typically one for each developer and one or more you deploy to), etc, but I've seen the problem even over the course of three developer machines and a day or two, as some refactoring made people unable to update their code and then run a only-slightly-older migration.

One solution, advocated in the "Data migrations" section of Code review: Ruby and Rails idioms is just to fall back to writing migrations in SQL, bypassing active record (with the exception of the low-level parts of active record which connect to the database). This has two problems. The first is that active record doesn't help you a lot with this kind of low-level SQL construction. The example in that block post uses string interpolation to construct SQL, which they can get away with in that example (because the columns are integers) but which blows up as soon as the quoting isn't correctly handled (in a migration, this is probably just a bug rather than a security hole, but search "SQL injection" if you are unfamiliar with the problems). The second problem is that active record just is a more expressive way to manipulate data. How many people use script/console rather than script/dbconsole to look around the database?

My recommended solution, also advocated in How to use models in your migrations (without killing kittens), is to define the classes within the migration. There's an example in that blog post, but the short summary is that if, for example, your migration wants to refer to Vendor, you put "class Vendor < ActiveRecord::Base; end" within the migration class. In some cases you might need to define a few has_many or belongs_to relationships (make sure to set class_name to refer to the migration-specific class), but the interesting (and surprising to me) thing is that I've found that in practice you don't need a whole lot of them. Just to give a few examples of what this gets you, think of things like calling find_or_create_by_name to skip creating a record if it already exists, or looking up an object by name and then using its ID in a subsequent SQL statement. If you are thinking "but I can do that in SQL", then I'm not sure I should try to convince you. But if you are thinking "yeah, that is easier / more-concise / more-readable in active record" then defining your classes in the migration gets you both this, and also lets you run migrations even after your code has continued to evolve.