Exploring the Myth: Calculating Square Root is Expensive
I know this curious fact about game development that says where possible never to use the magnitude of a vector (unless required) because it involves a costly square-root calculation. Even the Unity documentation affirms this notion. What’s interesting is that I’ve never officially learned this and I only really see it floating around the internet, most recently on Reddit. It seems to be one of those things aspiring game developers learn through osmosis.
It is one of those sayings that makes sense on the surface but I wanted to dig into how much slower the square root operation was and if it had any meaningful impact on performance in what I would deem, “normal” circumstances.
When programmers talk about the cost of an operation they typically mean how many instructions are required to perform the operation. For example a multiplication would typically take three instructions, two reads and one write. For more complex operations (such as division) it often takes many more steps to calculate an accurate representation of the result, thus, the expense in the operation. While square root may have once been an extremely costly exercise I have a hunch that it is now a much less relevant piece of advice than it used to be. Additionally it leads newer programmers to focus on changing the way they write code in order to optimize said code as they go. I am a big believer in writing clean, verbose code and optimizing only when it is absolutely necessary. That can be hard as a new programmer where you often want to write cleaner and more efficient code than your last attempt at solving the same problem. It gives you a sign that you are progressing and allows you to tackle bigger problems.
I devised some small tests to get some real world numbers on the time it took my machine to complete a large number of square root calculations and then compared them with the alternative.
The Experiment
Perform 1,000 loops of 1,000,000 calculations (yes for a total of 1,000,000,000 calculations). Record the minimum, maximum and average time it took to complete each one of these loops in “real world” time. Each loop consisted of either a square root calculation, multiplying a target variable by itself, or raising the same target variable to the power of 2.
I’m not overly concerned about how long any of these operations actually take. I don’t care about the fastest time, I care about the proportionate time between the operations. The likelihood here is that I may very well be able to get faster times given different set ups. As an example everything was being run in debug mode on a Windows machine, this is likely to effect the overall time it takes to complete each task. Take the actual values with a grain of salt, we can compare the interesting parts further down. To see the code I used to run the tests check out my gists here. If you want a brief overview of the code I was testing out it really all boils down to comparing the below.
If the prevailing opinion is that square root is slower than simply multiplying our target value by itself then it is obvious to pit those two calculations against each other. I chose to add the power function to my testing because it seems like a simple interchange to make. Instead of using square root I could instead square my target value by raising it to the power of two.
I’ve also added some Unity specific tests focusing on Vector3.magnitude
vs Vector3.sqrMagnitude
as another metric by which to judge, because quite frankly, that is more important to me as a predominantly Unity developer.
To make sure this myth is not language specific I tested in dotnet Core 2.1, Unity 2018.2.14f1 .NET 3.5 Equivalent, Node 8.9.0 and Python 2.7.15. For reference I am testing on Windows 10 on an i7 8750-H CPU.
Results
As mentioned above I’m testing that this myth exists across programming languages. However I don’t want to compare between programming languages because the speed of the language in general doesn’t bother me. Let’s see how each language performed.
These results show us that there is a small difference in the speed of calculating a square root when compared to simply multiplying our target value. In c# the power function was considerably slower on average than both the square root and multiplication approaches. We could easily write code that performs worse than just simply using our square root calculation to start with. The latter also happens to be easier to read code.
Ignoring the fact that performing Vector math is on average slower than float math, which I expected, checking the magnitude operation was not that much slower than checking the square magnitude.
In an attempt to put this altogether I tried to visualize how much faster, or slower, each approach was than using a square root.
We can see that in the case of the Unity it is significantly better, 2.5x in fact, to use multiplication over using a square root. The other languages however are all modestly different, if we use either approach for a reasonable amount of calculations we are unlikely to see a serious bottleneck in performance.
In the best case scenario, at 2.5x better performance for multiplication, what sorts of gains could we expect to see? Per operation, that is for a single square root, we could save a whopping 0.033173 microseconds. If we instead tried to be smart and raise our target value to the power of two we would make things considerably worse, however we would still only be adding 0.157795 microseconds. There is no doubt that performing Vector math will have overhead because of the two dimensions but performing a check on square magnitude instead of magnitude only nets a performance increase of 0.051819 microseconds.
Final Thoughts
The above is a classic case of micro-optimization. On paper it seems amazing to write code that is 2.5x faster. But it does come at the cost of some readability, and debug-ability, for fairly minimal performance gain. Technically square root is slower than multiplying our target value by itself, but practically I am not so sure, not for typical use cases anyway. If you are new to programming it is fine to learn these pieces of information and keep them tucked away. With that said you do not need to rush out use them when you could simplify your math. Checking against something such as magnitude will be easier for you, or your coworkers, to debug later.
If you are in the position where you need to calculate 1,000,000 square roots in one frame of a game loop then I’d argue you have a design problem. Look at alternative solutions such as separate threads or an async pattern instead of looking to try to optimize your square root function. I would also like to hope that by the time you reach a problem like this you are already well on your way to understanding the pitfalls of micro optimization.
As a final note I found the speeds coming out of Unity as a whole really interesting. Of all the languages I expected the Unity square root approach to be one of the fastest across the board. Given the language is designed for game development I expected a slightly less accurate float with the benefit of speed. That just didn’t seem to be the case here. My advice, get the feature in and optimize it once you know it is a problem.