tabs ↹ over ␣ ␣ ␣ spaces

by Jiří {x2} Činčura

Where are the differences in execution speed of various method types come from (.NET)?

27 Nov 2017 .NET, .NET Core, JIT, RyuJIT

In previous post I measured execution speed of static and instance methods. Here I’ll dig deeper and I’ll try to find where the difference comes from. Bear in mind, I don’t have a deep knowledge of processors, JIT or assembly. I’m just thinking out loud, poking and observing.

Thinking

What’s the main difference between static and instance method? Yes. The this parameter. And also whether the call is direct or indirect (aka virtual). Could these two pieces be the reason for the speed difference?

Passing the this argument is not eating too much of time as one can easily prove comparing times for Static0 and Static2 in previous post. These times are virtually the same (RyuJIT on .NET 64-bit times I consider being an exception rather than a rule). But the this argument also needs to be validated that it’s not null. In case of virtual call, the address of method to be called needs to be found.

What is processor instructed to do

Static method call.

00007FF86D6304C4  call        00007FF86D6300A8

And compare this to direct instance method call.

00007FF86D6204C4  cmp         dword ptr [rcx],ecx
00007FF86D6204C6  call        00007FF86D6200B0

Awesome. The cmp [rcx], ecx is clearly the validation of this. Looking at generated assembly for various method I’m pretty sure both CLR and CoreCLR are using fastcall-like calling convention (in the assembly snippet above in 64-bit version). In this convention the RCX register is first integer argument. In common languages that’s where the this is. The result of cmp is not used, but that’s fine. Just trying to dereference the RCX in case it’s null/0, will raise segfault and subsequently NullReferenceException. Moving forward to virtual call.

00007FF86D6004C4  mov         r11,7FF86D500020h
00007FF86D6004CE  cmp         dword ptr [rcx],ecx
00007FF86D6004D0  call        qword ptr [r11]

Exactly as expected. The null check is still there, plus now the call goes via the R11 register’s value, not address directly (By the way, there’s an effort in CoreCLR to devirtualize (not only) such calls.).

Just in case you’d like to see how it would look like for structures, I looked at that as well.

00007FF86D6104C4  mov         qword ptr [rsp+30h],rcx
00007FF86D6104C9  lea         rcx,[rsp+30h]
00007FF86D6104CE  call        00007FF86D610098

Awesome again. Structures are value types, thus cannot be null, therefore the null check is not there.

Conclusion

All this together, it’s clear why the static method is fastest and virtual call slowest. There’s simply more work to be done.

As I said at the beginning I’m not an expert in processors, JIT or assembly. I was connecting the dots and trying how changing this changes that and whether I can pair it with some other stuff I already knew. At the end this isn’t useful for day to day .NET programming, but, hey, the road to the knowledge was an absolute blast.