I was wrong - ORDER BY RANDOM is slow
A few weeks ago I posted an article asking "Can we please stop saying ORDER BY RAND is slow?"
The answer? No we cannot. I was wrong. Completely wrong. Two comments in my post pointed out some flaws in my benchmarks: I was using too small a data sample, and not running the benchmark code through enough iterations.
So I performed a new benchmark with a data sample of 1,000,000 records rather
than 100,000, and iterated through each query 100 times. The results? ORDER BY RANDOM
is quite slow with large datasets:
While SELECTing just an indexed column is much faster than doing SELECT * in MySQL, the
results are still too slow to be usable in many circumstances.
So what should you do if you need to select random records?
Andrew F left this helpful comment on my previous post:
The best way to select a random row from a MySQL table is using a trick I got from Mediawiki: create an indexed float column, set it to RAND() for each row, and select random rows using:
SELECT * FROM table WHERE randnum > RAND() ORDER BY randnum LIMIT 1This runs as an index range scan, making it basically instantaneous.
If you can work with a small data or result set, you might also still consider
ORDER BY RAND/RANDOM, particularly if the result can be cached. For example:
SELECT id FROM people WHERE date_created = ? ORDER BY RANDOM(), where your result set
will include only a few thousand records. But be aware that if you want to
select randomly against an entire table, as your dataset grows, it may
eventually become too slow to be practical.
Published