Thursday, November 16, 2006

SQL DELETE of all rows not as easy as you'd think

So clearing out the contents of a table from an SQL database is a relatively common operation. Tests might do it to start from nothing, or MIFOS's own testdbinsertionscript.sql does it so that the tests can have some sample data which is a bit different than what we supply for production.

Sounds simple, right? Just execute:


DELETE FROM TABLENAME


And in fact that works most of the time.

But there is a fairly common case in which things
might not be quite that simple. Suppose that each row of the table points to a parent. For example:


create table foo(id integer primary key,
name varchar(255),
parent integer,
foreign key(parent) references foo(id)
);
insert into foo values(1, 'Eve', null);
insert into foo values(10, 'Seth', 1);
insert into foo values(101, 'Enos', 10);


(For the non-SQL-aware, the FOREIGN KEY stuff just means what I said in words - that the parent points to another record in the table).

Now in this case suppose we try to delete a row:


delete from foo where id = 1


This should fail, and does, because to delete the record for Eve would leave the record for Seth pointing to nothing.

But now try:


delete from foo


If the database deleted the records one at a time, and applied all the usual rules, then it might fail (depending on in what order the database processes the records). In fact that is what you see in MySQL, and the developers of MySQL have offered a way around this by adding an ORDER BY to their DELETE statement.

Hypersonic is much like MySQL, except it seems not to honor any ORDER BY.

Postgres and Derby, on the other hand are smarter: they just will delete all the rows (I don't know whether they look at foreign keys as a group rather than row-by-row, or what, but the observation is that the delete Just Works).

Right now Mayfly is like Hypersonic/MySQL, without the chance to specify ORDER BY. I guess the Postgres/Derby behavior is the right one (although I'll have to think about how to implement it - if it were a simple change I would have just done it, rather than all this whining). Somehow ORDER BY doesn't feel right to me. It seems to be based too much on a model of how delete is to operate, and not enough on what result delete is supposed to produce.

For now, I worked around this in MIFOS by first clearing the parent pointers and then deleting the rows:


update foo set parent = null
delete from foo


That could get complicated if one were not allowing NULL in this column. But for this situation, it seems like a pretty painless workaround (this particular test data setup isn't a performance bottleneck, so there is no need to worry about that).

No comments: