6502 Integer Square Root – which is best?
The purpose of this page is to compare the performance of several different implementations of a 16 bit integer square root on the 6502 CPU, to find out which is best.
This function is sometimes known as isqrt, and conventionally it rounds down the result, so the result fits in 8 bits.
See the Wikipedia page for integer square root for details of algorithms.
We execute each routine exhaustively over all 65536 possible inputs, record the cycle count for each and graph the results.
Implementations tested
All implementations have been sourced from the internet and reformatted for the acme assembler.
Python Script
After assembling each file using acme, we use py65mon to load and execute the binary 6502, check the results are accurate and record the cycle count.
The results are then output to a CSV file for graphing in a spreadsheet.
Results
All algorithms proved to be correct. We graph the cycle count of each algorithm over all possible inputs.
file | average cycle count |
---|---|
sqrt1.a | 317.7 |
sqrt2.a | 846.5 |
sqrt3.a | 43.8 |
sqrt5.a | 731.0 |
sqrt6.a | 522.9 |
sqrt7.a | 501.5 |
All cycle counts include the final RTS, but not any initial JSR. Add 6 cycles for an initial ‘JSR sqrt’ instruction.
Conclusion
It’s a speed vs memory trade off.
If speed is all important and you can afford to use 1K of memory then go with the fastest routine sqrt3.a.
But if you can’t afford 1K of memory, then go for the next fastest (and much smaller) sqrt1.a.