Crazy Z80 optimization trick!

Got questions? Got answers? Go here for both.

Moderator: MaxCoderz Staff

Post Reply
User avatar
Dwedit
Maxcoderz Staff
Posts: 579
Joined: Wed 15 Dec, 2004 6:06 am
Location: Chicago!
Contact:

Crazy Z80 optimization trick!

Post by Dwedit »

Bregalad on the Nesdev forums just informed me of a trick for optimizing if-then-else type blocks, where the "else" area consists of a 2-byte instruction.

So you normally have an if-else-endif block like this:

Code: Select all

jr nz,else    ;the IF
;some code
jr endif
else:
;some code
endif:
But here's a crazy trick for when the Else code is a single 2-byte instruction:
You use the first byte of a 3 byte instruction with no side effects instead of the "jr endif" line!
So if you had code like this:

Code: Select all

cp 7
jr nz,else
ld a,3
jr endif
else:
ld a,4
endif:
You could replace it with this:

Code: Select all

cp 7
jr nz,else
ld a,3
.db $C2  ;jp nz,xxxx
else:
ld a,4
endif:
Instead of branching over the ld a,4 instruction, it now executes a jp nz,XXXX instruction where the XXXX is the two bytes of the next instruction. You already know what the flags will be here, so you can make the jump never taken. You can use this to skip the next two bytes of execution! Who needs to branch over it?
You know your hexadecimal output routine is broken when it displays the character 'G'.
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

omg that is cool!
what would that do to a disassembler?
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

Nice idea. :) This could also be done for a one-byte else block using jr. And theoretically for a 3-byte block too (as long as the side effects are acceptable), but that could in no way be faster than branching directly.
Liazon
Calc Guru
Posts: 962
Joined: Thu 27 Oct, 2005 8:28 pm

Post by Liazon »

o.O wow i'm speechless...
Image Image Image
User avatar
blueskies
Calc Wizard
Posts: 553
Joined: Tue 25 Apr, 2006 2:24 pm

Post by blueskies »

what, you guys didn't know about this? ;)

j/k, I don't even understand.
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

the instruction you branch to is the address - part of the other jump, which should not be taken (because the condition is never true - so the instructions in the first part should not have an unpredictable result) so those 2 bytes are skipped without a jump (they are loaded as address that is never used)

right?
User avatar
driesguldolf
Extreme Poster
Posts: 395
Joined: Thu 17 May, 2007 4:49 pm
Location: $4080
Contact:

Post by driesguldolf »

That is one cool trick!

@King Harold: that wouldn't harm a disassembler at all, you just won't be able to see the else-block.
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

Unless it takes the first jump and reads those instruction and them reads the instructions without taking the jump and then having a double instruction on some addresses? (would that happen?)
User avatar
Dwedit
Maxcoderz Staff
Posts: 579
Joined: Wed 15 Dec, 2004 6:06 am
Location: Chicago!
Contact:

Post by Dwedit »

I think the disassembler I made would interpret it as a 3 byte instruction, and set the else label to be relative to an instruction boundary.
You know your hexadecimal output routine is broken when it displays the character 'G'.
User avatar
driesguldolf
Extreme Poster
Posts: 395
Joined: Thu 17 May, 2007 4:49 pm
Location: $4080
Contact:

Post by driesguldolf »

Image
PTI is always correct :mrgreen:

j/k, I guess it's disassembler specific :P

Edit:
Image
:? I guess some emulators just don't have it... :P
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

driesguldolf wrote:PTI is always correct :mrgreen:
But that’s only possible because the runtime value of PC is available to the emulator, while an offline disassembler won’t be able to analyse the code at such depth. I added that feature to make disassembly more robust (e.g. legitimate instructions can be masqueraded similarly if there are some data bytes before them). The fact that it works for this trick is just a direct consequence of that.
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Re: Crazy Z80 optimization trick!

Post by qarnos »

Dwedit wrote:Instead of branching over the ld a,4 instruction, it now executes a jp nz,XXXX instruction where the XXXX is the two bytes of the next instruction. You already know what the flags will be here, so you can make the jump never taken. You can use this to skip the next two bytes of execution! Who needs to branch over it?
Cool idea, but from all sources I can find (here's one) the JP cc instructions take 10 T-states regardless of whether or not the jump is actually taken, so this trick would be no different, timing wise, than changing JR endif (which takes 12 T-states) to JP endif.

It does, however, save you one byte and 2 clocks over JR endif, and two bytes over JP endif but for the sake of code readability I probably wouldn't bother! :P
"I don't know why a refrigerator is now involved, but put that aside for now". - Jim e on unitedti.org

avatar courtesy of driesguldolf.
User avatar
tr1p1ea
Maxcoderz Staff
Posts: 4135
Joined: Thu 16 Dec, 2004 10:06 pm
Location: I cant seem to get out of this cryogenic chamber!
Contact:

Post by tr1p1ea »

Pretty clever trick, would probably only use it in size critical routines however.
"My world is Black & White. But if I blink fast enough, I see it in Grayscale."
Image
Image
User avatar
qarnos
Maxcoderz Staff
Posts: 227
Joined: Thu 01 Dec, 2005 9:04 am
Location: Melbourne, Australia

Post by qarnos »

Now that I think about it, this idea does offer a time benefit if you are talking about a 1 byte instruction, instead of two.

The JR instruction takes only 7 T-States if the branch isn't taken (presumably because the Z80 doesn't have to add the relative offset to PC).

Compare this code:

Code: Select all

        jp  z, _else    ; [10]
        add hl, bc      ; [11]
        jp  _endif      ; [10]
_else:  add hl, de      ; [11]
_endif:
That takes 31 T-states for if and 21 T-states for else.

Now try this:

Code: Select all

        jp  z, _else    ; [10]
        add hl, bc      ; [11] assume this can't ever carry
        .db $38         ; [7] code for JR, C
_else:  add hl, de      ; [11]

This only takes 28 T-states for if. A small saving, but could be useful in tight loops, and saves 2 bytes!

The only reason not to use this for 1-byte instructions would be code readability and bug safety. Watch those flags!
"I don't know why a refrigerator is now involved, but put that aside for now". - Jim e on unitedti.org

avatar courtesy of driesguldolf.
Post Reply