Code Refractor - Virtual Machines/Compiler performance musings: December 2013

Tuesday, December 24, 2013

Target 0.0.2: Status 4

The previous post was that are many lurking errors in optimizations. I'm happy to say that these errors were fixed (there are some optimization failures sometimes, but the "canonical" NBody is working again with the tip revision in GitHub).

The optimizations are also profiled, so it should compile in 10-15% faster than previously (the CR compiled part, but most of time you will spend inside GCC compilation).

CodeRefractor.Compiler.exe is renamed to cr.exe so it is easier to use it from command line.

As future development I will look into two areas:
- fix some known last remaining bugs (the inliner gives for some properties invalid code, also it doesn't optimize out the empty methods )
- I would love to extend some parts of backing generated code: it would be great to give a "resolver" to help the linker with custom code.
The idea is that delegating implementing methods can be done by some solver module:
- when a method is found by compiler, before reflecting it, will ask the solver module if it has a specific implementation that can be either a C++ code or a CIL method that CR will reflect against
- if it is a C++ code, the code will be inserted as-is as being a body of code.
- if is CIL code, it will be scanned and it will be used later.
- similarly a similar logic will work for types.

Why this bootstrapping, and who will implement it?
Initially, OpenRuntime (the default run-time) but it can be attached for all assemblies you code against. So every time when an assembly is loaded, the CR will scan for this bootstrap mapper.

Advantages? Right now the way the bootstrapping to C++ happen is fully static. This means that if you want to implement a big subsystem (let's say a replacement for System.Console or Windows.Forms) you cannot point to an existing implementation, but you have to take one source and fully annotate it with [MapType] (the CR's attribute of choice to annotate custom implementations) attribute. With this new implementation, and some short reflection code (that can be isolated, and will be a helper class given with these APIs) will theoretically be possible to use an implementation from an implementation that support the implementation without annotations.

Also, it could make possible to parametrize some C++ code generator options. Right now the code is static, so you cannot write a Max function independent on parameters (like a generics Max), and I hope with these changes the code will get better formatted and generated.

The idea is not my invention, but is based on the brief descriptions of InvokeDynamic instruction in Java world. Anyway, I did not read the specification (and/or implementation details) for the reason of having a clean room implementation.

Friday, December 20, 2013

Target 0.0.2: Status 3

As I've said previously there are some optimizations that are hard to find when they interact badly. If you face problems generating code, you should disable optimizations or try to reduce the errors and make a bug report.

Given this, I was fixed one that I could reduce the case. But at the end, I will recommend users to review get small cases as bug reports.

The MSIL and C++ code can be seen side by side and when the compile fails.

A snippet of the output code of an Prime number program:

// IL_0049: ldc.i4.1

// IL_004a: stloc.0

// IL_004b: nop

// IL_004c: ldloc.1
vreg_32 = local_1;

// IL_004d: ldc.i4.1

// IL_004e: add
vreg_34 = vreg_32+1;

// IL_004f: stloc.1
local_1 = vreg_34;
label_80:

// IL_0050: ldloc.1
vreg_35 = local_1;

// IL_0051: conv.i8
vreg_36 = (System_Int64)vreg_35;

// IL_0052: ldarg.1

// IL_0053: cgt
vreg_38 = (vreg_36 > vreg_37)?1:0;

In this way you can see if any optimization interact badly with your CIL. Also, reducing the optimizations you will see the C++ code reflecting more the CIL code, and the more optimizations are added, the CIL and C++ code do go appart.

Int64 (aka long) type is also supported right now both as instructions (as conv.i8 or ldc.i8) and as optimizations (constant folding).

Tuesday, December 17, 2013

Opinion: Talking ARM

AnandTech (a technology site) invited people to put questions regarding ARM's core A53, the 64 bit successor of A7 core (which is 32 bit and widely used in mobile phones today).

The lead designer was kind enough to answer it as of yesterday.

Which is the best part as for me?
They talked about the fact that ARM is scaled both down and up. This means that many software will need to extend to many places. I see it it will work on microservers, set-top boxes.

The high end ARM CPUs can have L3 cache (AMD 8-16 core CPUs were high end, and Intel goes with similar core counts) for designs to up-to 32 cores. It also looks that ARM pushes optimized GCC libraries and they are a contributor to it.

What means for CodeRefractor? Or for C#?
It doesn't on a short run. The good part of CR is in my view is that C++ runs optimized on most platforms correctly: from low end ARM to Asm.JS. This will work with fast everywhere.

This will mean in long term that most things that work on C#, will work on more hardware than CLR runs, or even it runs everywhere, you will have to work with many VMs. With C++ target of CodeRefractor output, you can make a simple build that will work everywhere.

Saturday, December 7, 2013

Target 0.0.2: Status 2

This is a small update (and the following ones will be also somewhat small as the close to New Year holidays will come) but important:
- after the latest changes I've noticed that some optimizations interact badly but I did not have time to debug all. So right now is enabled a smaller subset of optimizations and I would be really glad if someone has time to look into them (why they are failing). Optimizations are really important to make sure that programs perform fast, but the hardest part is to make all simplifications or analysis correctly
- after the bad piece of news, there is also a good piece of news: derived classes in which the data is stored along many classes (like you have class CoordinateXyz and a class Vector: CoordinateXyz, the class), the code will be stored and displayed in a compatible binary layout. In the past, the fields of the base class were put after the fields of the principal class. The logic of fields analysis is extracted into a class named TypeDescription

Code Refractor - Virtual Machines/Compiler performance musings