Tuesday, October 22, 2013

Status update: Part 8

This post is fairly weak in content as I noticed that there are some bugs introduced in the newest bits, but there are also some improvements:
- the generated code does not use for generated types namespaces. Some name mangling were fixed because of this
- the duplicate runtime methods do not appear anymore
- documentation is vastly improved: as I use CodeRefractor as a bachelor paper, I described all the optimizations and various components of CR in an extensive manner. There are 40 written pages with content that may be interesting. If you are interested (at least as a lecture) about various internal parts or how the code is made, I recommend reading the Documentation folder.
- generic class specialization works (partially at least) and name mangling is better of managing generics
- there are also advances (small ones) into supported delegates, but is still not ended. The best part is that the delegates code is isolated and when I will have time I will be able to finish them

Still the main area of focus remains for now: bug fixing and/or code cleanup.

I want with this post to thank Jetbrains by offering a (free) license of ReSharper to advance CodeRefractor. I think that R# is a great tool in itself and I hope you will test it if you are not using it you will see how it improves the code quality of your code base.

Monday, October 7, 2013

NBody: can C# be *very* fast?

As you read in the previous blog post, performance of Java and .Net can be fast (or even faster) than C++ compiled code. Anyway, this was a reason for me to look the performance discrepancy and to implement three "remaining" optimizations that made the gap of performance really big:
- escape analysis: allocations for objects that are not escaping are made on stack
- common subexpression elimination
- loop invariant code motion (LICM)

So what it means in performance terms? I will not put exact numbers, but with the best C++ time, you will get a bit less than 1300 ms (on Linux), and around 1400 ms on Windows (on both VC++ and MinGW).

Why these optimizations were so important:
- escape analysis removes for cases of parameters of functions necessity of increment/decrement reference counting
- having a lot of small items (in expressions) that do repeat, they will be precomputed once. This part also work over function calls (if the functions are evaluated as pure).
So if you have a rotation matrix, and you compute against cosine (alpha) and sine(alpha), you don't have to cache the sine and cosine, the compiler will do it for you automatically.
- LICM (Wikipedia article) will work as common subexpressions, but in case your code has expressions that do not change over the loops, they will be executed at the start of the loop once, and not at every iteration. This optimization works also with pure functions, so if you make a function call, this function call will be moved outside of the loop also.


This also means that I will not work (excluding there are bugs) for optimizations for some time, but you may try to generate code and the result "should scream".