Norman Clarke

Programming open source code for fun and profit!

I was wrong - ORDER BY RANDOM is slow

A few weeks ago I posted an article asking "Can we please stop saying ORDER BY RAND is slow?"

The answer? No we cannot. I was wrong. Completely wrong. Two comments in my post pointed out some flaws in my benchmarks: I was using too small a data sample, and not running the benchmark code through enough iterations.

So I performed a new benchmark with a data sample of 1,000,000 records rather than 100,000, and iterated through each query 100 times. The results? ORDER BY RANDOM is quite slow with large datasets:

While SELECTing just an indexed column is much faster than doing SELECT * in MySQL, the results are still too slow to be usable in many circumstances.

So what should you do if you need to select random records?

Andrew F left this helpful comment on my previous post:

The best way to select a random row from a MySQL table is using a trick I got from Mediawiki: create an indexed float column, set it to RAND() for each row, and select random rows using:

SELECT * FROM table WHERE randnum > RAND() ORDER BY randnum LIMIT 1

This runs as an index range scan, making it basically instantaneous.

If you can work with a small data or result set, you might also still consider ORDER BY RAND/RANDOM, particularly if the result can be cached. For example: SELECT id FROM people WHERE date_created = ? ORDER BY RANDOM(), where your result set will include only a few thousand records. But be aware that if you want to select randomly against an entire table, as your dataset grows, it may eventually become too slow to be practical.

Published