### Dumb assembly test program to run 10000N or 20000N levels of ### calls. On this Celeron N4120, with N = 10000, it takes 255-265ms ### with only 10000N (with recursedest = recurse), but usually ### 490-500ms with 20000N (with recursedest = curse). Occasionally I ### see a spike to 612ms or 783ms or 852ms. I don’t know what to ### think of that. But in the normal case it’s about 235ms for 100 ### million extra call/return pairs. ### ### This is 2.35 nanoseconds per call/return. I thought possibly ### call/return was usually faster than that because possibly 10000 ### levels of call/return was overflowing some internal processor ### return address prediction buffer, but even dropping the recursion ### depth to 10 and bumping the iteration count up to compensate ### doesn't make it faster. Note that by contrast the test/jz/dec sequence ### evidently adds only about 0.25 nanoseconds. ### ### Even more puzzling behavior on a 3.667GHz AMD Ryzen 5 3500U: ### 754–785ms with the mutual recursion, but only 114–147ms without ### it. So just the bare call/ret pair of `curse` costs 6 ### nanoseconds, but the test/jz/dec/call/ret sequence of `recurse` ### only costs 1.5 nanoseconds?! Maybe I should align my jumps, ### hmm... no, adding .align 16 inserts some padding but makes no ### difference in the performance. Inserting nops into `curse` makes ### it slower, not faster. ### ### I thought it might be that the extra levels of calls were ### overflowing the Ryzen’s buffer, but no, reducing N (the recursion ### depth) to 5000 or 1000 reduces the run time only proportionally; ### it still takes like 6 or 8 nanoseconds per level, not 1.5. .intel_syntax noprefix .globl main .equ recursedest, curse # .equ recursedest, recurse main: sub rsp, 8 push rbx push rbp mov rdi, [rsi + 8] # argv[1]; segfaults if not provided call atoi mov rbx, rax mov rbp, 10*1000 # multiply call count to avoid stack overflow 1: mov rdi, rbx # recurse rbx levels deep call recurse sub rbp, 1 jnz 1b pop rbp pop rbx add rsp, 8 xor eax, eax ret recurse: test rdi, rdi jz 1f dec rdi call recursedest 1: ret curse: call recurse ret