Page 1 of 1

Crazy Z80 optimization trick!

Posted: Sat 24 Nov, 2007 7:08 pm
by Dwedit
Bregalad on the Nesdev forums just informed me of a trick for optimizing if-then-else type blocks, where the "else" area consists of a 2-byte instruction.

So you normally have an if-else-endif block like this:

Code: Select all

jr nz,else    ;the IF
;some code
jr endif
else:
;some code
endif:
But here's a crazy trick for when the Else code is a single 2-byte instruction:
You use the first byte of a 3 byte instruction with no side effects instead of the "jr endif" line!
So if you had code like this:

Code: Select all

cp 7
jr nz,else
ld a,3
jr endif
else:
ld a,4
endif:
You could replace it with this:

Code: Select all

cp 7
jr nz,else
ld a,3
.db $C2  ;jp nz,xxxx
else:
ld a,4
endif:
Instead of branching over the ld a,4 instruction, it now executes a jp nz,XXXX instruction where the XXXX is the two bytes of the next instruction. You already know what the flags will be here, so you can make the jump never taken. You can use this to skip the next two bytes of execution! Who needs to branch over it?

Posted: Sat 24 Nov, 2007 8:03 pm
by King Harold
omg that is cool!
what would that do to a disassembler?

Posted: Sat 24 Nov, 2007 9:50 pm
by CoBB
Nice idea. :) This could also be done for a one-byte else block using jr. And theoretically for a 3-byte block too (as long as the side effects are acceptable), but that could in no way be faster than branching directly.

Posted: Sun 25 Nov, 2007 3:46 am
by Liazon
o.O wow i'm speechless...

Posted: Sun 25 Nov, 2007 4:07 am
by blueskies
what, you guys didn't know about this? ;)

j/k, I don't even understand.

Posted: Sun 25 Nov, 2007 11:09 am
by King Harold
the instruction you branch to is the address - part of the other jump, which should not be taken (because the condition is never true - so the instructions in the first part should not have an unpredictable result) so those 2 bytes are skipped without a jump (they are loaded as address that is never used)

right?

Posted: Sun 25 Nov, 2007 1:56 pm
by driesguldolf
That is one cool trick!

@King Harold: that wouldn't harm a disassembler at all, you just won't be able to see the else-block.

Posted: Sun 25 Nov, 2007 2:50 pm
by King Harold
Unless it takes the first jump and reads those instruction and them reads the instructions without taking the jump and then having a double instruction on some addresses? (would that happen?)

Posted: Sun 25 Nov, 2007 3:02 pm
by Dwedit
I think the disassembler I made would interpret it as a 3 byte instruction, and set the else label to be relative to an instruction boundary.

Posted: Sun 25 Nov, 2007 6:01 pm
by driesguldolf
Image
PTI is always correct :mrgreen:

j/k, I guess it's disassembler specific :P

Edit:
Image
:? I guess some emulators just don't have it... :P

Posted: Sun 25 Nov, 2007 6:58 pm
by CoBB
driesguldolf wrote:PTI is always correct :mrgreen:
But that’s only possible because the runtime value of PC is available to the emulator, while an offline disassembler won’t be able to analyse the code at such depth. I added that feature to make disassembly more robust (e.g. legitimate instructions can be masqueraded similarly if there are some data bytes before them). The fact that it works for this trick is just a direct consequence of that.

Re: Crazy Z80 optimization trick!

Posted: Tue 27 Nov, 2007 7:56 pm
by qarnos
Dwedit wrote:Instead of branching over the ld a,4 instruction, it now executes a jp nz,XXXX instruction where the XXXX is the two bytes of the next instruction. You already know what the flags will be here, so you can make the jump never taken. You can use this to skip the next two bytes of execution! Who needs to branch over it?
Cool idea, but from all sources I can find (here's one) the JP cc instructions take 10 T-states regardless of whether or not the jump is actually taken, so this trick would be no different, timing wise, than changing JR endif (which takes 12 T-states) to JP endif.

It does, however, save you one byte and 2 clocks over JR endif, and two bytes over JP endif but for the sake of code readability I probably wouldn't bother! :P

Posted: Wed 28 Nov, 2007 6:57 am
by tr1p1ea
Pretty clever trick, would probably only use it in size critical routines however.

Posted: Fri 30 Nov, 2007 8:49 am
by qarnos
Now that I think about it, this idea does offer a time benefit if you are talking about a 1 byte instruction, instead of two.

The JR instruction takes only 7 T-States if the branch isn't taken (presumably because the Z80 doesn't have to add the relative offset to PC).

Compare this code:

Code: Select all

        jp  z, _else    ; [10]
        add hl, bc      ; [11]
        jp  _endif      ; [10]
_else:  add hl, de      ; [11]
_endif:
That takes 31 T-states for if and 21 T-states for else.

Now try this:

Code: Select all

        jp  z, _else    ; [10]
        add hl, bc      ; [11] assume this can't ever carry
        .db $38         ; [7] code for JR, C
_else:  add hl, de      ; [11]

This only takes 28 T-states for if. A small saving, but could be useful in tight loops, and saves 2 bytes!

The only reason not to use this for 1-byte instructions would be code readability and bug safety. Watch those flags!