It seems z80 coderz are divided in 2 main groups for this matter: the LDI group and the LD group.
So let's compare the routines (including initialization):
The LDI way:
Code: Select all
ld de,x ;10 (3)
ld hl,y ;10 (3)
ld bc,z ;10 (3)
-: ld a,(de) ;7 (1)
ldi ;16 (2)
dec hl ;6 (1)
ld (hl),a ;7 (1)
inc hl ;6 (1)
jp pe,{-} ;10 (3) it's PE right?
; = 76 (18)
Code: Select all
ld de,x ;10 (3)
ld hl,y ;10 (3)
ld b,z ;7 (2)
-: ld a,(de) ;7 (1)
ld c,(hl) ;7 (1)
ld (hl),a ;7 (1)
ld a,c ;4 (1)
ld (de),a ;7 (1)
inc hl ;6 (1)
inc de ;6 (1)
djnz {-} ;13 (2)
; = 84 (17)
The LDI way seems to be faster, and what's more, it can exchange up to the full addressable memory of the z80 (although doing so would be pointless).
As you can see, the "LDI way" sacrifices 1 byte to lose 8 t-states. As you may also have seen, including the initiation loads favours the "LD way".
Note that the "LD way" gains 5 t-states on the last loop since DJNZ takes 8 t-states when B becomes zero (when the jump is not taken). This doesn't compensate for the overall loss of speed though.
Or is this just because I made a mistake somewhere?
IMO 1 extra byte is worth the 8 t-states that you can get rid of..