We discussed results internally and came out with several low-level fixes. Here they go:
Minimize created objects countFirst of all we thought about enormous number of objects we created during execution of our algorithm. As you probably remember, we’d applied fix named ‘Temp object creation’, when we tried to minimize number of performed arithmetic operations. It had negative effect on performance because of increased memory allocation and garbage collector overhead. So, the less objects you created – the better performance you get. It is not hard to notice that most of objects in our code are created here:
What if we don’t create new objects here at all? Let’s store in stack individual coordinates instead of creating wrapper objects. Sure, it makes code more complicated and unreadable, but performance is our main goal today. So, we came out with this:
Please note, we removed two bad fixes from the previous article. We’ve got nice results in all browsers, but in Safari results were really amazing: about 45% performance boost. If we remember bad “Temp object” fix from the previous article we notice that Safari was dramatically slower than other browsers after that fix, so such result is logical consequence of some issues Safari has with objects allocation and/or garbage collection.
Again, it makes our code less readable and understandable, but we really want better performance, so we don’t care about code readability here.
Isn’t it amazing? Absolutely incredible results: we’ve got 70% boost on Firefox, 56% on IE and 36% on Safari. And what about Chrome? Surprisingly it is 9% slower. It seems that Google already implemented automatic function inlining in V8 optimizer and our manual fix is worse than theirs. Another interesting thing: with that fix Firefox is almost two times faster than Chrome, previous leader.
ImageData array has two dimensions that are packed into one dimensional array line by line. So, 3x3 pixel matrix with coordinates (x;y), is basically stored in memory like that:
Let’s look at the following lines in our code:
Let’s think about the order we access neighbor pixels if we are in (1;1):
So, we’re going left, then once again left, then right, then again right. According to best practices of cache utilization optimization it is better to access memory sequentially, because it minimizes a chance to have cache miss. So, what we need is something like that:
Let’s rewrite our dx,dy arrays:
Here is what we’ve got:
Fixing own bug – reorder if statementsThose of you who read inlined code carefully might already noticed that we actually did a mistake there:
Results are surprising:
Most of the browsers results were in the area of statistical error, but Chrome was more than two times faster and got its crown of fastest browser back in our benchmark! It is hard to tell what the reason of such a dramatic difference is. Maybe other browsers use instruction reordering and already applied that optimization by themselves, but it looks strange Chrome don’t do it. There also may be some hidden optimization heuristics in Chrome that help it understand that stack is used as simple array, not hash-table and it makes some significant optimization based on that fact.
Keep the balance between code readability and performance considerations. Sometimes it makes sense to inline function even if it makes your code less readable. But always make sure that it brings you desired value: it doesn’t make sense to sacrifice code readability for 2 ms performance boost for code that is already fast enough.
Think about object allocation, memory usage and cache utilization, especially if you work with memory intensive algorithms such as Flood Fill.
You can find all the results with exact numbers at Google Spreadsheets: https://docs.google.com/open?id=0B1Umejl6sE1raW9iRkpDSXNyckU
You can check our demo code at GitHub: https://github.com/eleks/canvasPaint
You can play with app, deployed on S3: https://s3.amazonaws.com/rnd-demo/canvasPaint/index.html
Thanks to Yuriy Guts for proposed low-level fixes.