In previous post I measured execution speed of static and instance methods. Here I’ll dig deeper and I’ll try to find where the difference comes from. Bear in mind, I don’t have a deep knowledge of processors, JIT or assembly. I’m just thinking out loud, poking and observing.
What’s the main difference between static and instance method? Yes. The
this parameter. And also whether the call is direct or indirect (aka virtual). Could these two pieces be the reason for the speed difference?
this argument is not eating too much of time as one can easily prove comparing times for
Static2 in previous post. These times are virtually the same (RyuJIT on .NET 64-bit times I consider being an exception rather than a rule). But the
this argument also needs to be validated that it’s not
null. In case of virtual call, the address of method to be called needs to be found.
What is processor instructed to do
Static method call.
00007FF86D6304C4 call 00007FF86D6300A8
And compare this to direct instance method call.
00007FF86D6204C4 cmp dword ptr [rcx],ecx 00007FF86D6204C6 call 00007FF86D6200B0
cmp [rcx], ecx is clearly the validation of
this. Looking at generated assembly for various method I’m pretty sure both CLR and CoreCLR are using fastcall-like calling convention (in the assembly snippet above in 64-bit version). In this convention the
RCX register is first integer argument. In common languages that’s where the
this is. The result of
cmp is not used, but that’s fine. Just trying to dereference the
RCX in case it’s
0, will raise segfault and subsequently
NullReferenceException. Moving forward to virtual call.
00007FF86D6004C4 mov r11,7FF86D500020h 00007FF86D6004CE cmp dword ptr [rcx],ecx 00007FF86D6004D0 call qword ptr [r11]
Exactly as expected. The
null check is still there, plus now the call goes via the
R11 register’s value, not address directly (By the way, there’s an effort in CoreCLR to devirtualize (not only) such calls.).
Just in case you’d like to see how it would look like for structures, I looked at that as well.
00007FF86D6104C4 mov qword ptr [rsp+30h],rcx 00007FF86D6104C9 lea rcx,[rsp+30h] 00007FF86D6104CE call 00007FF86D610098
Awesome again. Structures are value types, thus cannot be
null, therefore the
null check is not there.
All this together, it’s clear why the static method is fastest and virtual call slowest. There’s simply more work to be done.
As I said at the beginning I’m not an expert in processors, JIT or assembly. I was connecting the dots and trying how changing this changes that and whether I can pair it with some other stuff I already knew. At the end this isn’t useful for day to day .NET programming, but, hey, the road to the knowledge was an absolute blast.