raspbian_vs_windowsI have installed Windows 10 on Raspberry Pi 2, then I have created a simple C# application for it. Now, I am curious what is the difference in performance between Windows 10 IoT Core and Raspbian. To test that I will run a simple C# code on both OS.

Code

I will do a simple calculation in a loop and will run this code in multiple threads. The amount of threads - 4, because Raspberry Pi 2 has 4 cores. This will load CPU up to 100%.

I know that I am using different CLRs and different compilators, but I am doing this just for fun :)

Because you cannot run C# application on Windows 10 in console mode I will use a bit different code for each OS.

Raspbian

Mono JIT compiler version 4.0.1

public class Program
{
    private static int iterations;
    private static void Main(string[] args)
    {
        iterations = 100000000;
        var cpu = Environment.ProcessorCount;
        Console.WriteLine("Iterations: " + iterations);
        Console.WriteLine("Threads: " + cpu);
        Profile(cpu);
    }

    private static void Profile(int threads)
    {
        // Warm up JIT
        DoStuf();
        Iterate();

        var watch = new Stopwatch();
        Task[] tasks = new Task[threads];

        for (int i = 0; i < threads; i++)
        {
            tasks[i] = new Task(Iterate);
        }

        // clean up
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();

        watch.Start();

        for (int i = 0; i < threads; i++)
        {
            tasks[i].Start();
        }

        Task.WaitAll(tasks);

        watch.Stop();
        Console.WriteLine("Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
    }

    private static void Iterate()
    {
        for (int i = 0; i < iterations; i++)
        {
            var c = DoStuf();
            a = c/2*a;
        }
    }

    public static int a;

    private static int DoStuf()
    {
        var y = 2 + a;
        return y*a/9;
    }
}

Compile:

mcs /debug- /optimize+ /platform:arm bench.cs

Windows 10

Microsoft .net 4.5

public sealed partial class MainPage : Page
{
    public MainPage()
    {
        this.InitializeComponent();
    }

    private static int iterations = 100000000;

    private void ButtonBase_OnClick(object sender, RoutedEventArgs e)
    {
        ResultLabel.Text = "Calculation:";
        var cpu = Environment.ProcessorCount;
        ResultLabel.Text += "\nIterations: " + iterations;
        ResultLabel.Text += "\nThreads: " + cpu;
        ResultLabel.Text += "\n" + Profile(cpu);
    }

    private static string Profile(int threads)
    {
        // warm up 
        DoStuf();
        Iterate();

        var watch = new Stopwatch();
        Task[] tasks = new Task[threads];

        for (int i = 0; i < threads; i++)
        {
            tasks[i] = new Task(Iterate);
        }

        // clean up
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();

        watch.Start();

        for (int i = 0; i < threads; i++)
        {
            tasks[i].Start();
        }

        Task.WaitAll(tasks);

        watch.Stop();
        return $"Time Elapsed {watch.Elapsed.TotalMilliseconds} ms";
    }

    private static void Iterate()
    {
        for (int i = 0; i < iterations; i++)
        {
            var c = DoStuf();
            a = c/2*a;
        }
    }

    public static int a;

    private static int DoStuf()
    {
        var y = 2 + a;
        return y*a/9;
    }
}

and UI:

<Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
    <StackPanel Orientation="Vertical">
        <Button x:Name="StartButton" Content="Start" FontSize="100" Click="ButtonBase_OnClick"/>
        <TextBlock x:Name="ResultLabel"
                FontSize="150"/>
    </StackPanel>
</Grid>

I have used Visual Studio 2015 to compile and deploy this app in Release mode.

As you can see I have used static variable in calculations (DoStuf and Iterate methods). This will prevent compiler from optimization.

Results

I have executed each program 10 times and then calculated an average execution time. I have also tested console application on my desktop (Intel Core i5-4590). I was surprised (less is better)...

linux_vs_windows

 

As you can see mono runtime on Raspbian is 5 times slower than Microsoft .Net on Windows 10 IoT Core. Mono is 5 times slower! Why? Maybe I have made incorrect benchmark? Or is it just a question of compiler optimization?

I have checked IL code. Iterate and DoStuf methods compiled to the same code:

  .method private hidebysig static void  Iterate() cil managed
  {
    // Code size       43 (0x2b)
    .maxstack  2
    .locals init (int32 V_0,
             int32 V_1)
    IL_0000:  ldc.i4.0
    IL_0001:  stloc.0
    IL_0002:  br         IL_001f

    IL_0007:  call       int32 CpuBenchmark.Program::DoStuf()
    IL_000c:  stloc.1
    IL_000d:  ldloc.1
    IL_000e:  ldc.i4.2
    IL_000f:  div
    IL_0010:  ldsfld     int32 CpuBenchmark.Program::a
    IL_0015:  mul
    IL_0016:  stsfld     int32 CpuBenchmark.Program::a
    IL_001b:  ldloc.0
    IL_001c:  ldc.i4.1
    IL_001d:  add
    IL_001e:  stloc.0
    IL_001f:  ldloc.0
    IL_0020:  ldsfld     int32 CpuBenchmark.Program::iterations
    IL_0025:  blt        IL_0007

    IL_002a:  ret
  } // end of method Program::Iterate

  .method private hidebysig static int32 
          DoStuf() cil managed
  {
    // Code size       19 (0x13)
    .maxstack  2
    .locals init (int32 V_0)
    IL_0000:  ldc.i4.2
    IL_0001:  ldsfld     int32 CpuBenchmark.Program::a
    IL_0006:  add
    IL_0007:  stloc.0
    IL_0008:  ldloc.0
    IL_0009:  ldsfld     int32 CpuBenchmark.Program::a
    IL_000e:  mul
    IL_000f:  ldc.i4.s   9
    IL_0011:  div
    IL_0012:  ret
  } // end of method Program::DoStuf

Calling code also looks similar in both cases. I have also tried similar code but with single thread, results were the same.

Now I will wait for Core CLR for Linux and do this benchmark again. I hope we will have better results.