Code Refractor - Virtual Machines/Compiler performance musings: July 2015

Sunday, July 19, 2015

Using .Net for Developing Games, a 2015 review

Before talking about game development, as a disclaimer I'm not a game developer even though I do have some (working) experience with older versions of OpenGL and DirectX and hands on experience with C++ and C#. Also, I kept track of current technologies (as much as time allows).

First of all, let's clarify the terms: there are obviously games which can write and run in C#, I'm thinking here like most board games like Chess, Go, even strategy games, or similar. Even more, you can do more than these games, and I'm saying the best of my knowledge game written in C# which is Magicka, but again people will sneeze and will say: but this game doesn't use Havok (the physics engine) or if a C# game would use it, people will say: but the Havok is not written in C#, but is it written in C++.

Given this, I want to make as fair as possible review of .Net platform as a game development tool.

Here are some really great pluses
+ C#'s peak performance (after the application starts up), especially if you avoid as plague to work with strings, but using mostly arrays and integers/double types, will make your code to run adequate (typically around 70-90% of C++ code, even better match up if you use 64 bit .Net)
+ C# allows for the hottest of the code to be written in C++ and also allows to let you use no bounds checking using "unsafe" code. This makes that if you need a specific code to be autovectorized and you notice that C++ compiler does it but the C# one does not (and you don't want to use Mono.SIMD code to write your own matrix multiply code) to be very highly optimized
+ the call speed from PInvoke is adequate as .Net "natively" maps COM calls and C calls, meaning that if you use either DirectX or OpenGL, you are covered
+ having complex game logic can be more easily written in C# than in C++, especially as some C++ game engines use Lua as a backend. Writing it into C# should give some times speedups
+ you can use struct types so you can reduce the times the memory collection is happening

Here are really bad minuses:
- coding recklessly will create a lot of garbage in memory making pressure on GC. It can take sometimes seconds (for huge heaps, like multi GB heaps) which is unacceptable even in a board game
- allocation by default is on heap, meaning that if you create a List<T>, in fact always it will create on heap 2 objects, the first is the List<T> itself, and the internal array which stores the actual data. This is really bad because when you add to List<T> items, the internal array is "resized' which in the .Net (or Mono or CodeRefractor) implementations mean that a new array is allocated, meaning that a lot of more GC pressure happens. In C++ by default objects are allocated on stack with no hidden costs. If you use std::vector<T>, the internal array is on heap, but the vector itself is on stack.
- Linq can create without noticing a lot of objects: especially when you use: ".ToArray()", or ".ToList" or for a statement that wants to return a pair of values.
This code:
var playerAndLifes = players.Select(player => new Tuple<Player, int>(player, player.Life)).ToArray();
Looks really innocent, but in fact "Tuple" is a class, so is allocated on heap, and also ToArray will resize in power of two for the length of your "players" object. So for an 1300 players will be around 8 reallocations, and for 2600 players will be 9 reallocations and so on.
For the previous code, make a struct STuple in your codebase and use it. Also, if you know the size of players, do not forget to read ways to improve your Linq performance article.
- Objects in .Net are huge, so if you keep a single byte or integer index (even it has its own more complex associated logic) consider using struct or enum types. The reason why objects in .Net are huge, is that they contain much more information in the object header, including typeId, some space to be able to lock on them, If you have a class which stores 1 integer, on 32 bits .Net is 12 bytes, but on 64 bit machine is 24 bytes. So for every single allocation of an object, you will waste an extra 8 or 20 bytes. In C++ if you don't use virtual calls, the overhead is zero for object internals, but can be bigger if the memory allocator is not efficient. For virtual method classes, the overhead is typically the size of the pointer (4 bytes on 32 bit machines and 8 bytes for a 64 bit machines).
- Texts are UTF16, which very often is a good thing, but when you want high(er) performance, if you write them on disk, they occupy 2 times more space. Even worse, they do increase memory usage and again will create presure on GC. Try to work with UTF8 encoded strings internally and do interning (meaning to merge strings all over your application) so at least when GC happens will have less work to do
- Even is not necessarily an issue of .Net in itself, an easy way to support Save/Load inside games is to use a serializer that stores or restores your entities on disk. Using the default "DataContract" or even BinarySerializer are slow. Use protobuf-net (ProtoBuffers) as it is a very easy to use library to do this part and it can run many times faster. Similarly, try to not use any xml/json or alike for levels where is it expected to have many enties of any kind
- the JIT (Just in time) compiler sometimes make things ugly! The JIT time is typcally very small, but it is happening every time a new method in code is hit. If you have big methods and/or a bigger logic, you may expect to see "frame-skips", especially as per frame there is the "tyranny" of 16.6 ms per frame. Making methods small and try to remove duplicate code should make that when you get a new item or you see a new enemy which has a new game logic which is exposed to the player, which would require for .Net to analyse it, should be faster to optimize. But the even better way is simply to NGen your application.

What is weird as for me, is that the biggest factor into responsive games is not itself the compilation's performance (which .Net has it right from year 2009 I would say, with .Net 3.5 SP1), but the hidden overhead(s) of GC. You can get screwed many times and the ugly part of GC is that you don't know when it will hit you, even worse, you may not know which code creates classes (like System.Tuple or Linq's ToArray/ToList).

To wrap up, it looks to me that GC is the biggest factor for user to see freezes and as .Net improved as output of generated code (with initiatives like RyuJIT or CoreCLR) the elephant remains mostly to work with structs and to use an efficient serializer. This code can be very often improved by other means, typically by forcing a full GC at steps user waits already. After a game loads a full level into memory, a developer can force a full GC, after a round is finished and is written "Victory", another full GC can be forced. This style of coding is fine, but of course, if the game was expected to have a full round ended in 10 minutes but finished in 40 minutes, and the user has let's say a full GC of 3 seconds in the middle of the minute 35, this will ruin the experience.

Monday, July 6, 2015

Resharper 9 - a blast

Disclaimer: I've received an opensource license from JetBrains of Resharper for the second time. Thank you JetBrains!

I've been fairly critical sometimes with R# (Resharper) as is somewhat not accessible for some users, in the same time I've been using it. But I want to say why also code analysis in general and coding in particular is crucial with using today with a Resharper like tool.

So first of all, I want to make some criticism of Resharper and especially R# 9 as I've received:
- I've had a not updated R# 8 (it expired somewhere around October) and upgrading to 9.0 (which happen to be out of date because I didn't use R# for some time) made R# to report a lot of errors in code which were not there. Clearing the caches did fix all the known errors I had. But it was really strange (Google pointed me directly to the right place)
- Resharper doesn't default to use Solution Wide analysis. Maybe for low end machines is to be desired, or for very big projects, but as it is, at least for medium projects is a boon. I am sure that for big solutions (I'm thinking here programs like SharpDevelop or bigger) maybe Resharper runs slow to update the analysis (which in itself is a fair point) but the missing of the information that R# provides (like compilation errors you may have) by default, I found it as a big miss

Ok, so small bugs and not so great defaults. But in context of CodeRefractor's project it was so great feature because it made possible to make possible to big rewrites and right now it undergones the third rewrite. Every rewrite was justifiable for various reasons:
- the first and (as for me) very important one was that the internal representation was shaped very close to SSA form (or at least to LinearIL from Mono project). A subsequent almost as a full rewrite made the project to use an index of these instructions so optimizations will not do their job well, but they do it fast
- the second rewrite allowed a much refined way to find all methods (like virtual methods) so many more programs do run now (try it, it will do wonders)
- the third rewrite (that is currently going) that I will not write the details now

What I found great working features:
- creating property is automatic and fast with good defaults:
myValue.Width = 30;
//R# will suggest to create Width as an automatic property of int type
- creating automatic empty class taking into account of constrains:
BaseClass a = new MyNotDefinedClass();
//R# will suggest to create MyNotDefinedClass as BaseClass and will also implement some required data
- the Solution Wide analysis which takes into account when your code compiles. This feature is so awesome because you can combine it with two features: "Code cleanup" (which removes for example a lot of redundancies and reformats nicely the whole code) and "Find Code Issues".
- a R# 9.0 feature: code completion filters with various criteria (like: "properties only" or "extension methods only").
- unused parameters and the refactor to remove them globally is really a huge time saver of developer time

So in short, I have to say that if you start with Resharper from scratch, or you do want to use productively C#, I warmly recommend it to you. Also, don't forget the first thing after you open your solution to enable by default the Solution wide analysis (you have a "gray circle" on bottom-right: double click on it and click "OK" to the dialog it appears").

Also, please note that I tried to be as unbiased as I can, so I didn't point things that I'm sure that are invaluable for other projects like MVC3 or Xaml features (CR usage of Xaml is very limited), so here is only what I used (and enjoyed!) but some features may be for you closer to heart .

Improve performance for your selects in Linq

A think I learned inside CodeRefractor is how loops do work inside .Net. One thing I learned fairly quick is that the fastest loop is by far on arrays. It is documented also by Microsoft.

In short, especially using .Net on 64 bit, you will see high performance code over arrays so I strongly recommend if you have data that you read it often out of it (for example for using Linq), you should use ToArray() function.

So let's say you need out of your "tradeData" variable your names out of it.
The code may look like this:
return tradeData.Select(it => it.Id).ToArray();
What's wrong with this code? Let's say "tradeData" variable can have 1.000.000 items and tradeData can be itself an array or a List<T> and when you profile, you can see that iteration takes little time, but most of the time you will see like 16-18 allocations inside of ToArray(), the reason being that ToArray itself keeps an internal array which is resized for more times.

So it should be possible to write a "SelectToArray" method that will have much lower overhead:
public static class UtilsLinq
    {
        public static TResult[] SelectToArray<TValue, TResult>(this IList<TValue> items, Func<TValue, TResult> func)
        {
            var count = items.Count;
            var result = new TResult[count];
            for (var i = 0; i < result.Length; i++)
            {
                result[i] = func(items[i]);
            }
            return result;
        }
    }

As T[] implements IList<T> makes this code to work for both arrays and List<T>. This code will run as fast as possible and there are no hidden allocations.

And you code becomes:
return tradeData.SelectToArray(it => it.Id);

Strong recommendation for fast(er) code: when you use Select or SelectToArray do NEVER allocate inside it "class" objects but struct objects. If you want to keep a result with multiple data fields, create "struct" types which incapsulate them.

How fast is it? It it fairly fast.

For this code:
var sz = 10000000;
            var randData = new int[sz];
            var random = new Random();
            for(var i = 0; i<sz; i++)
            {
                randData[i] = random.Next(1, 10);
            }
            var sw = Stopwatch.StartNew();
            for(int t = 0; t<5;t++){
                var arr = randData.SelectToArray(i => (double)i);
            }
            var time1 = sw.ElapsedMilliseconds;
            sw.Stop();
            sw.Restart();
            for(int t = 0; t<5;t++){
                var arr = randData.Select(i => (double)i).ToArray();
            }
            var time2 = sw.ElapsedMilliseconds;
You have
time1 = 798 ms vs time2 = 1357 (Debug configuration)
time1 = 574 ms vs time2 = 1003 (Release configuration)

Not sure about you, but this is significant and also it is crucial of you have multiple Linq/Select statements and you want also the resulting items to be fast iterable. Similarly, you will have bigger speedup if you don't do the cast to double, but I wanted to show a more realistic code where the Linq it is doing something light (like typically happens as sometimes there is an indexer involved, or a field access).

NB. This test is artificial, and use these results at your own risk.
Later, I found there is a method: Array.ConvertAll which has very similar internals with this extension method (the limitation is that doesn't work with non-array implementations, but if this is not a big incovenience for you, is better to use the BCL classes).

public static TResult[] SelectToArray<TValue, TResult>(this TValue[] items, Func<TValue, TResult> func)
        {
            return Array.ConvertAll(items, it => func(it));
        }

Method changed to this and is a bit even faster, because the iteration of items variable si a bit faster this time.

Code Refractor - Virtual Machines/Compiler performance musings

Sunday, July 19, 2015

Using .Net for Developing Games, a 2015 review

Monday, July 6, 2015

Resharper 9 - a blast

Improve performance for your selects in Linq

Contributors

Blog Archive