Sunday, June 16, 2013

CR: Status updates - bimonthly updates and a roadmap

CodeRefractor is an enablement technology which allows developers to use in some cases C# (or any .Net languages) applications using C++ runtime (and no other .Net or Mono dependencies). This blog entry is to show a Roadmap and future plans. Also, for people that are interested in it's progress, I will try to give a bimonthly update.

As a mini legal note, I want to say that this Roadmap is subject to change and the info presented is directed to show parts that I found them interesting.

So what was done in the last 10 days?
- the optimizations setup is program wide: it is put in a part of logic that will handle the command line parameters. In the past, the coder had to set at every function compilation step which optimizations were wanted. So in future, the command line will set all optimizations (based on optimization level)
- the methods do not reside in class header anymore: this makes possible to have a saner looking code, as classes will store only their state. There is a clearer name-mangling scheme.
- types (System.Type, and System.Assembly) are mapped to classes that will be in future serializable: this is critical for future plan to to not compile everything at every compiler instantiation
- added CodeRefactor.OpenRuntime assembly: this part is critical to define runtime functions in C#.
This sample class maps System.Math class logic. Not all methods are mapped, but is a clear way to write a code that doesn't imply that users have to write and tweak a C++ library (as it was done before).

    [MapType(typeof(Math))]
    public class CrMath
    {
        [CppMethodBody(Header="math.h", Code="return sin(d);")]
        public static double Sin(double d)
        {
            return 0;
        }

        [CppMethodBody(Header = "math.h", Code = "return cos(d);")]
        public static double Cos(double d)
        {
            return 0;
        }
        [CppMethodBody(Header = "math.h", Code = "return sqrt(d);")]
        public static double Sqrt(double d)
        {
            return 0;
        }
    }
 
This generator code is cleaned a lot. The mapping of types made things a bit more complex, but in future I expect that the code will be more simplified.
    [MapType(typeof (Console))]
    public class CrConsole
    {
        [CppMethodBody(Header = "stdio.h", Code = "printf(\"%lf\\n\", value);")]
        public static void WriteLine(double value)
        {
        }
        [CppMethodBody(Header = "stdio.h", Code = "printf(\"%d\\n\", value);")]
        public static void WriteLine(int value)
        {
        }
        [CilMethod]
        public static void WriteLine(float value)
        {
            WriteLine((double)value);
        }
    }
The other improvement is that some methods can be mapped as C# methods. So the case of: Console.WriteLine(float value) is mapped in fact to CrConsole.WriteLine(float value); which in turn calls CrConsole.WriteLine(double float); which itself is a C++ method. If the code calls both methods - WriteLine (double) and WriteLine(float) - the code is linked once

Roadmap
So the code as it is, it looks to support some .Net programs (1.1 ones), so what seems the next part that is important to be done?
- make possible to save/restore a compilation result. The plan at least is that if you compile a huge project (even right now it doesn't work, as the CIL support is not so extensive as a part of CodeRefractor side) is to reuse the assemblies are already compiled. This saves time and make easier to debug cases. As CR represents the data in an intermediary format, this is somewhat easier to be debugged and tracked
make PInvoke to work: PInvoke is the next logical step of making "real programs" to work. The next item happen often with PInvoke so this will be also a high priority
- make Unsafe to work: this item makes possible to do under-the-cover operations, like Array.Copy to work with an optimized way. Also, it goes really often with PInvoke as many times PInvoke uses pointers to various structures
- make ldtoken instruction to work: CIL support is not complete, and one instruction that happen fairly often is: ldtoken. This instruction is executed to initialize arrays as raw data. This requires marshalling support, unsafe support (previous item) and better type representation as part of CR (this part that was started and partially done in the previous days)

What is not yet planned (where you can help):
- make a sane command line parsing: right now the input assembly, output .exe are hardcoded. The same is about optimizations, output folder, picking the compiler (is hardcoded to GCC that has to be inside the folder: C:\Oss\DevCpp).
- generics: they are very useful (and they they are going to be implemented, in an undefined time in future), but Generics are almost useless without a generic collections library. And this will take time.
- Linux/OS X/support: it should work with Linux (in fact the changes are so minimal, that I ask even myself when I will add support for it), but MonoDevelop/XamarinStudio does load the solution but when a crash happens, the debugger did not work so gracefully. So setting all parts take a lot of time of configuring and tracking down issues. If you have time and you want to try it, I am ready to support you, but this task should work without any advanced C# experience
- String and Steam support: there is a minimal support for strings, and PInvoke support will simplify and make possible some string handling, but String is an extensive class, used in very many places
- Reflection: this is like Generics, it'll certainly be done (once), but the System.Type reflection requires many systems to be in place, like a Type's table, and many runtime functions to make it happen
- an installer: this depends on Command line parsing (at least), but is fairly straight forward to be done
- many more...

Friday, June 7, 2013

Introduction to Code Refractor CIL to Native AOT

Welcome to CodeRefractor, an experimental project which compiles MSIL/CIL opcodes to C++ and after this it automatically generates a native (binary) version of the code.

The idea of using this convertor is to make possible that the MSIL code to run on machines that do not have Mono or .Net codes and is using as much as possible C++ paradigms.

Some questions I think that may occur for an interested reader and their answers

Is CodeRefractor a code-translator?
By a code-translator meaning that it translates the operation opcodes to C++ directly. The exact answer is no, it is a full compiler which uses an intermediary representation.

In a very brief steps the code, CodeRefractor does the following:
- reads an .Net assembly using reflection
- starts from the entry point (Main method) and starts reading CIL bytecodes/operations
- these operations are rewritten in an intermediary representation
- the intermediary representation is (optionally) optimized which simplify simple expressions and redundancies
- the intermediary representation is written into C++
- a C++ compiler is invoked and is linked with a static library that have some simple operations made into it and generates the executable form

This native executable should behave as much as possible as the original .Net application.

Most similar projects that do similar design are mostly in Java world:
- Excelsior JET
- GCJ
- RoboVM
- Mono --full-aot mode

This project will not be able to generate a C++ version of any C# (or Boo, F#, etc.) language and some specifics will make it to behave different (like many times slower), but a program written with the consideration of the design of the CodeRefractor should run similarly with a C++ code (even is written in C#).

How will CR handle licenses to not break them?
CR will not scan any GAC assemblies (and system ones too) for instructions, but it will let the user to add their own replacement implementations. We recommend for users to not read code that the license does not allow them to scan it. We plan for this project to use as a "backup" the Mono's BCL implementation (the CIL bytecodes) when something is missing.

C++ has no GC. so how memory management is done?
The code is using smart pointers and it requires a C++ 11 compiler (to make sure that the compiler supports smartpointers). This also means that users have to care about memory cycles. So, as an easy way to break cycles for now is to set pointers to null. In future plans, weak pointers will be supported.

Does it support Generics, Linq, Reflection, method Math.Cos,  ... (put your feature here)?
No, it doesn't and some of the limitations will be fixed as the software evolves. We hope that feedback and help from community will address as many of these features as early as possible.

What does it work?
Hello world kind of samples, the most complex version is NBody benchmark. This means in short: classes, static and not static (non-virtual) methods, many primitive types operations, while/for loops, array types, fields (but not properties yet), constructors

Where is the project?
 You can study, read, contribute to it here. The license file is not yet written, but it is: GPL2+ for compiler and MIT/X11 for libraries. Or rephrased (for companies) anyone which want to use it is: you can use it commercially, but if you make changes in the compiler, you have to contribute them back. If you let the compiler as-is, you can make your changes in the libraries for all your needs. We are using basically the same license as Mono project.

Legal note: we are not endorsed nor we have any relation with Mono project or Microsoft.