Norman Clarke

Programming open source code for fun and profit!

Can we please stop saying ORDER BY RAND is slow?

Here's a quick database question for you: What's the best way to select 10 random rows from a table with 100,000 records in it?

If you're a Ruby on Rails developer, chances are you did NOT say to use your database's built-in RAND() or RANDOM() function in an ORDER BY clause. There has been a meme going around the RoR world for years that this is slow and has to be avoided at all costs.

The reason this became a meme is probably because ActiveRecord does a `SELECT *by default, and if you doSELECT * FROM table ORDER BY RAND()` in MySQL, it is in fact quite slow.

However, performance improves dramatically when you select against an indexed column rather than doing a SELECT *. Here are some benchmarks showing the difference for the three most popular databases used with Rails:

In all cases, it's faster to select a single column: for Postgres the difference is almost trivial, but for MySQL and SQLite3, it's significant.

Notice that the MySQL query against a single column is only .05 seconds slower than an ordered select. This means that your application probably already has many queries with complexities that dimish performance more than using RAND() does.

A simple way to put this into practice in a Rails application would be to select your random ids, and then select your full records in a second query:

ids = User.all(:select => :id, :order => "RANDOM()", :limit => 10).to_s(:db)
users = User.all(ids)

One thing to keep in mind: SQLite3 and Postgres use RANDOM(), while MySQL uses RAND(). If want to create code that runs on both SQLite3 and MySQL, you need to use a plugin. I submitted this to Rails as a patch, but it was rejected... for performance reasons.

If you're interested in seeing my benchmark code, it's copied below.

Published