## RISC-V minimum function without using conditional jumps, as ## a test bench for candidate conditional-move instruction ## sequences. See ## and ## . .globl minop minop: j minopmin4 # Choose your poison. minopadd: slt t0, a0, a1 # set t0 to "is a0 < a1?" (0 or 1) addi t0, t0, -1 # convert 0 to -1 (a0 ≥ a1) and 1 to 0 (a0 < a1) sub t1, a1, a0 # set t1 := a1 - a0 and t1, t1, t0 # t1 is now either 0 or, if a0 ≥ a1, a1 - a0 add a0, a0, t1 # transform a0 into a1 if a0 ≥ a1 ret ## Alternative using xor: minopbin: slt t0, a0, a1 addi t0, t0, -1 xor t1, a1, a0 # set t1 := a1 ^ a0 and t1, t1, t0 xor a0, a0, t1 # transform a0 into a1 using xor ret ## An interesting question is to what extent we can use ## compressed instructions here. The `addi`, `add`, and `ret` ## in the first version can all be compressed, but the `slt`, ## `sub`, and `and` can’t. One problem with the `sub` is that ## it has three operands, but it doesn't need to, because it’s ## the last use of a1, so we could just stop using t1: minopmin: slt t0, a0, a1 addi t0, t0, -1 sub a1, a1, a0 and a1, a1, t0 add a0, a0, a1 ret ## And that does work, and compress, but we’re still left with ## two uncompressed instructions: `slt` and `and`. The `slt` ## necessarily uses three different registers, so the best we ## could do is the `and`. Why isn’t it being encoded as ## c.and? Can I encode compressed instructions by hand with ## .short? minopmin2: slt t0, a0, a1 addi t0, t0, -1 ## sub a1, a1, a0 ## Somehow objdump doesn’t treat this as an insn, but it does run: .short 0x8d89 and a1, a1, t0 add a0, a0, a1 ret ## c.and is the same format as c.sub, but with different ## opcode fields. sub a1, a1, a0 # this gets compressed sub a1, a1, t0 # aha, this does not! ## Hmm, the RVC registers accessible by the 3-bit fields in ## c.and and c.sub are x8 to x15: s0, s1, a0, a1, a2, a3, a4, ## a5 (RISC-V unprivileged spec, V20191213, table 16.1, p. 100 ## (p.118/238), §16.2). Not t0! t0 is x5 (table 25.1, §25, ## RISC-V Assembly Programmer’s Handbook), which RVC CA-format ## instructions like c.and, c.sub, and c.add can’t refer to. ## So we should use a2–a5 in preference to t0. minopmin3: slt a2, a0, a1 addi a2, a2, -1 sub a1, a1, a0 and a1, a1, a2 add a0, a0, a1 ret ## Fantastic, that nets us a fully compressed subroutine except for ## the initial `slt`: ## 000000000000004e : ## 4e: 00b52633 slt a2,a0,a1 ## 52: 167d add a2,a2,-1 (bug in objdump) ## 54: 8d89 sub a1,a1,a0 ## 56: 8df1 and a1,a1,a2 ## 58: 952e add a0,a0,a1 ## 5a: 8082 ret ## The xor version can also be equally compressed: minopmin4: slt a2, a0, a1 addi a2, a2, -1 xor a1, a1, a0 and a1, a1, a2 xor a0, a0, a1 ret