microformats

Liminal Existence

Clouds in Iceland

Thursday, June 21, 2007

SELECT * FROM everything, or why databases are awesome.

I've just committed a patch to ActiveRecord that prevents a large number of very, very bad queries from hitting your database. Go update your code, ASAP.

We've made some pretty significant progress towards scaling Twitter, and we're now at the point where the majority of requests that hit our site complete in less than 70 ms (mostly API requests), and the really complicated front-end pages that we display complete in less than 160 ms. There are still a lot of hiccups, so the average is higher than that, but we're constantly working on getting it down.

One of the consistent problems we've been facing is errant queries. We've been seeing (off and on) queries like:

SELECT * FROM statuses WHERE user_id = 234223 ORDER BY created_at

If you know anything about relational databases, this is a very bad thing, especially when you have users that have more than 20,000 statuses.

One major downside of having an object-relational mapper is that you don't always control what goes on behind the scenes. In tracking down this problem, first we investigated all our code, and weren't able to find the source of these problems. Switching tactics, we isolated some test cases that replicated the problem and brought out the big guns: print. This pretty quickly brought us to an obscure corner of the ActiveRecord source (three cheers for source code!), where it became apparent that Rails was doing these gigantic loads from the database every time we saved even a single field in a related object. There are a bunch of mitigating circumstances that mean that this bug doesn't get triggered all the time, but it's still really really bad.

Thankfully, the patch will be committed soon has been committed (32 minutes patch-to-commit!), and no-one will have to deal with, as Coda put it: "Arg stabby stab stab stabbity fuck stab" anymore. The fact that no-one noticed really speaks to how freaking awesome relational databases (in our case, MySQL) are these days.

Perhaps underlying all of this is the simple fact that most of the time, ActiveRecord and Rails in general is pretty solid, and Ruby underneath is a fully sound language with which to build high-volume services. Kevin over at PowerSet has more on the topic - they've recently announced that they'll be doing their front-end development in Ruby (up until now, it's just been a glue language internally).

9 Comments:

Anonymous Bill Kocik said...

Is the patch in edge Rails, then, or somewhere else?

Thanks for finding and fixing this. :)

Friday, 22 June 2007 01:46:00 GMT+01:00  
Blogger Blaine said...

The patch was against edge rails and Rails 1.2.4 - the patch itself should apply back quite a ways in the revision history.

Friday, 22 June 2007 02:05:00 GMT+01:00  
Blogger jdunck said...

I guess I don't know anything about relational databases, but why is this so bad?

SELECT * FROM statuses WHERE user_id = 234223 ORDER BY created_at

I assume you have an index (user_id, created_at) if it's a big table and this is a common query. Surely the "select *" bit isn't that bad?

Friday, 22 June 2007 05:50:00 GMT+01:00  
Blogger jdunck said...

...And now that I read the patch and understand what you're actually saying is wrong, nevermind.

Friday, 22 June 2007 05:53:00 GMT+01:00  
Blogger Blaine said...

jdunck: the SELECT * FROM ... bit isn't so bad, except there's no LIMIT clause - which has the unfortunate side effect of unnecessarily loading thousands of rows.

The real crux of the bug is that ActiveRecord was making these queries for every save against the record; a hundred saves would result in a hundred extra undesired SELECTs, which is a really big deal when the database is busy doing other things.

Friday, 22 June 2007 09:05:00 GMT+01:00  
Anonymous Florian said...

That's interesting.

Are these queries logged to the logfile? And did you see memory leaks/growth because of this?

Wednesday, 27 June 2007 22:13:00 GMT+01:00  
Blogger Blaine said...

Yup, the queries were logged; they were being generated by the normal association code, so that's expected.

We didn't see memory leaks, because I think it's a case that Ruby's GC handles nicely. Turning them off definitely corresponds to a decrease in load against the database.

Wednesday, 27 June 2007 22:17:00 GMT+01:00  
Anonymous Anonymous said...

You're serious? This is like the mother of all fuckups in SELECTs and _nobody_ noticed until now?

I wonder what else is lurking down in the AR/Rails dungeons besides this bug-that-for-gods-sake-should-be-found-10-seconds-after-ga-release.

Sunday, 8 July 2007 00:05:00 GMT+01:00  
Blogger Gnoll110 said...

Never never never use SELECT * in non interactive situation.

The problem with '*' is that it makes assumptions about the order that the columns are returned in!

At some future time, a DBA or the database vendor may change something that changes the order that the columns are stored/return in. This is the same reason you should always use fields by names (not by number).
I have rejected code for this in the past!

This is a database/SQL issue, not a language issue. It goes for Rails, Ruby, Java or COBOL.

Monday, 16 July 2007 16:30:00 GMT+01:00  

Post a Comment

Links to this post:

Create a Link

<< Home