Code Refractor - Virtual Machines/Compiler performance musings: July 2014

Saturday, July 26, 2014

Why you cannot beat the compiler...

In the last weeks there was a big rewrite inside CodeRefractor. The main change is basically that we do in a multiple pass algorithm the building of code tree that can be built from the Main method. The changes we propagated from the frontend to the deep of the virtual methods backend (where the code is generated) and simplified.

using System;
class A
{
    public void F() { Console.WriteLine("A.F"); }
    public virtual void G() { Console.WriteLine("A.G"); }
}
class B : A
{
    public void F() { Console.WriteLine("B.F"); }
    public override void G() { Console.WriteLine("B.G"); }
}
class Test
{
    static void Main()
    {
        B b = new B();
        A a = b;
        a.F();
        a.G();
        b.G();
    }
}

Let's start with this code. Which is CR's output for Main method (C++ code)?

System_Void _Test_Main()
{

_B_F();
_B_G();
_B_G();
return;
}

CR doesn't have full inlining (yet!) but it has analysis steps that check if variables are used over the method boundaries and the most important part: it does it safely!

Many people may say that this code is contrived and unrealistic and I have to agree, also it has to be a "lucky case" that many things to happen:
- methods to devirtualize correctly
- the members of the method calls are not used
- the instance of B has no members that are used, so the compiler can remove it safely

But on the other hand, this is also the strength of the compilers today: one month ago CR would not be able to write this code, but today it cans, so all you have to do to get a bit better performance is just update the compiler (or wait for the next release) and voila, you're good to go. Even more, every time when in future the compiler is able to devirtualize, it will devirtualize, over all program, and the most important bit: with no work from your side!

So, every time when you write code that targets a compiler (any compiler, not only CR), if the compiler project is improving, all you have to do is to wait, or to report bugs if the compiler doesn't compile your code, or update minimally your code to avoid the compiler issues and the code will improve itself.

Friday, July 18, 2014

Some String support

A simple program as this one:

static void Main()
{
        var s1 = "Hello";
        var s2 = " world";
        var s3 = s1 + s2;
        Console.Write(s3);
}

would not work and even seems to be a simple program, in fact is a bit harder because CR didn't have a nice way to map types and to implement a mix of the CR's internal string and the real string, so there was the need to duplicate the entire code of string and make it work.

Right now it works as the following:

[ExtensionsImplementation(typeof (string))]
    public static class StringImpl
    {
        [MapMethod]
        public static string Substring(string _this, int startIndex)
        {(...)}
        [MapMethod(IsStatic = true)]
        public static string Concat(string s1, string s2)
        {(...)}
        [MapMethod]
        public static char[] ToCharArray(CrString _this)
        {(...)}
}
By adding implementation of mapped CrString for the System.String type makes possible to make implementations natural and as the library will grow (please spend very little time to support a simplified version of any class you think you can implement in .Net) many programs which were missing just the part of displaying some combined strings, right now is all possible.

Thursday, July 10, 2014

Improving debugging experience by improving internal structures

CodeRefractor has some clean advantages over other static compilers that target LLVM by targetting C++ but also some disadvantages.

One great advantage is that it is written in .Net and it bootstraps everything needed for output, and the output is somewhat human readable result. This also allows simple optimizations to be done as CR rewrites everything into an intermediate representation (IR) and this representation is simplified with many high level optimization steps.

So this instruction was in the past setup as a list of pairs of values: instruction kind and the instruction data and many casts were made based on this instruction. This is good design, but right now the code is represented as a specific instruction class. So if in the past for storing a label, you would have something like: pair of (Label, 23) where 23 was the label's name, right now is a class Label where it stores: JumpTo property which is set to value 23.

Similarly, more changes are to debug methods, in the past there was a similar pair of the method kind and a MethodInterpreter, and sadly the MethodInterpreter type will store based on the kind of method various fields which some of them were redundant, some of them unused, and changes of this kind. As the code right now is split, the experience in debugging CR is better also.

At last, but not at least, even is not yet fully implemented, it is important to say that the types right now are resolved using a construction named ClosureEntities, and there are steps (similar with optimization steps) that they find the unused types. This also will likely improve in future the definition of types and make possible to define easier the runtime methods. This new algorithm has one problem as for now: it doesn't solve the abstract/virtual methods and the fixing of them will be necessary in future.

Tuesday, July 1, 2014

Let's get Visual

About a month ago, Ciprian asked for developers to join CR, and I thought since I love the .Net ecosystem, I should be able to help a little here and there.After I looked at the code, having never really understood compilers or transpilers in this case, I needed an easier way to visualize what was going on behind the hood, so I could get up to speed with CR.

I then decided to mock up a "Visual Compiler" to both test CR and to get a simple call graph.
The Visual Compiler (VCR), once setup enables seamless development and testing of CR, alot of improvements can be made, such as full debugging support or clicking errors to take you to the line of code, but for now its a pretty useful tool. With this I have been able to catch up quite a bit and add/fix a few runtime features such as virtual methods and interfaces to CR.

VCR has the following features;
- opening files
- enabling / disabling specific optimizations
- running the C# / CPP code and viewing output
- and testing the output of C# vs that of CPP.
- multiple compiler support (although there is no gui to set that, yet)

With the knowledge gained, I am hoping to complete a Pet Project (C# Editor, Compiler and Interpreter for iOS), that has been in the Attic for about 2 years now.

I am currently writing a few more tests and looking into delegates to have CR almost feature complete with the C# language.

Ofcourse even with this tool, I needed alot of guidance, which Ciprian has been kind enough to provide. I am pretty optimistic of where this project is heading :)

Oops, Almost forgot, this is what VCR looks like;

VCR Running on Windows Server 2012 R2

My next blog post will be on the architecture of CR itself, stay tuned.

Enjoy CR,
Ronald Adonyo.

Code Refractor - Virtual Machines/Compiler performance musings