Tuesday, September 24, 2013

NBody: can C# be fast?

One question for blog readers is: "how fast could be my C# code if is written in C++?" And in many cases people come with their "numbers" like:
- Java JIT is faster than C (or C++)
- C# is faster than Java because it has structs
- C++ is faster because everything in this world is written in C++ for performance, for example games, etc.

Of course this blog will come with it's personal spin, so if you have a strong opinion about who expects to win, please ignore what is written next.

A benchmark I looked to optimize as it has many common operations for a computing kernel is the NBody benchmark, because it includes:
- mathematical operations
- math intrinsics
- somewhat complex array accesses and iterations
- is written in a cache friendly way(like the reads are mostly sequential) so if you have many memory indirections the performance will be hurt
- it doesn't depend on a complex library functionality (like Regex would do)

Given this, the NBody sourcecode which is hardcoded to 5.000.000 will run in:




(update, there is a last minute fix, .Net times were included with non-release builds, the best time is now 1550 ms)
 
Runtime     Time(ms)
.Net 4.5 64 bit 1550
MinGW 4.7 32bit 2860
MinGW 4.7 64bit 2840
Win JDK 6 -server 32 bit 1500
Linux JDK 7 -server 64 bit 1444
Linux G++ 4.7 64 bit (-O3) 1494
Linux G++ 4.7 64 bit (PGO) 1378

Some people will notice that I didn't test MS VC++ (on Windows) or Clang on Linux. In fact, I did, but maybe I was wrong in my setups and MS VC++ was slower than MinGW on Windows. Clang++ was also slower than GCC, like would run in 2 seconds (so was like on my up-to-date Mint Linux). The point is mostly to test (best) managed vs (best) native compilers. Also the test doesn't test C# vs C++, but C# against a C++ translation of this C# code made by CodeRefractor, which can be interpreted in any way a reader wants.

So what I found in this testing:
- if you do low level math, Java may save the day, it is by far the safest choice, and is very easy to setup and the conversion of the C# code
- if you know what you're doing, you can get much faster performance if you use the proper OS/compiler in C++ world. Even I didn't use Intel Compiler, or I have access for now just to MS VC++ 2010, using MinGW and Linux can get you a tad faster code (at least if you count the code is written somewhat low level but neutral on optimizations)
- MinGW will give virtually the same performance in NBody on 32 and 64 bit (which was a big surprise for me) at least on this code. Maybe is a bug in my setup, but I pick the best time for this test, and in general I was getting sometimes slower times on 64 bit
- using PGO, 64bit, GCC, -O3, Linux gave at least on my machine the best performance.

For people to reproduce these tests, I have the following machine:
i5 540M 2core @ 2.4 Ghz
6 G RAM
The source is under this revision (of the output C++ file) which reflects the NBody benchmark which is the result of CodeRefractor
Windows 7 64bit
GCC is 4.7.2 under MinGW (part of DevC++ distribution)
VS 2010
.Net 4.5
For Linux: Mint 15 (with updates)/GCC 4.7.3
Best running GCC arguments:  -std=c++11 -ldl -O3 -flto -mtune=native -march=native -fprofile-use 
where:
-O3 = level 3 of optimizations,
-flto = global optimizer
-mtune=native = optimize for my machine (I think it may not matter)
-march=native = use instructions of my machine (I think it may not matter)
-fprofile-use (using PGO running profiles)
In fact without mtune and march (on 64 bit) the performance is basically the same, but I put these parameters to make sure that users who read this blog and try to reproduce to get similar level of performance.

No comments:

Post a Comment