Page 5 of 5
Posted: Sat 24 Mar, 2007 9:22 am
by Halifax
hmm ok maybe I was wrong.
Posted: Tue 07 Aug, 2007 11:25 am
by Halifax
Change
Code: Select all
ld a,KEY_GROUP
out (1),a
nop
nop
in a,(1)
to
Code: Select all
ld a,KEY_GROUP
out (1),a
ld a,(de)
in a,(1)
This saves 1 byte and 1 tstate.
Posted: Wed 08 Aug, 2007 1:42 pm
by Timendus
CalcKing wrote:Expanding on Dwedit's trick, I created a macro system for Bot Attack.
Code: Select all
#define curVPutS(curPos) call cur_vputs \ .dw curPos
...
curVPutS(1+(256*9)) ; Draw author's name... ;-)
.db "by Peter Wakefield",0
;-----> Optimized vputs routine
; inputs: bytes following call: CoordLSB,CoordMSB,Null-terminated string
; output: String displayed, smaller than using redundant code
cur_vputs:
pop hl
ld e,(hl)
inc hl
ld d,(hl)
inc hl
ld (pencol),de
bcall(_vputs)
jp (hl)
Why didn't you do it like this:
Code: Select all
#define print(xcoord,ycoord,string) call cur_vputs \ .dw xcoord+(256*ycoord) \ .db string,0
...
print(1,9,"by Peter Wakefield")
I guess it should work just as well (the API uses something like this somewhere I think, can't check now since my server is dead). It shouldn't make a difference in speed or size, but it greatly improves the readability of your code.
Oh, wait, if you do it like this, it'll also save you a few instructions (faster, but possibly bigger depending on how many times you use the macro):
Code: Select all
#define print(xcoord,ycoord,string) ld de,xcoord+(256*ycoord) \ call cur_vputs \ .db string,0
...
print(1,9,"by Peter Wakefield")
;-----> Optimized vputs routine
; inputs: bytes following call: CoordLSB,CoordMSB,Null-terminated string
; output: String displayed, smaller than using redundant code
cur_vputs:
ld (pencol),de
pop hl
bcall(_vputs)
jp (hl)
Posted: Thu 08 May, 2008 8:11 am
by junki
[quote="sigma"]If you want a 16-bit loop counter, never, ever do this:
Code: Select all
- ; Loop body
; .
; .
; .
ld a, d
or e
jp nz, -
That would be useful, if each run of the loop must be in constant time.
Just nitpicking about the "never, ever"
Juha
Posted: Fri 09 May, 2008 11:53 pm
by driesguldolf
Time variation added by doing it the correct way shouldn't bother you in most (if not all) occasions.
Re: [TI ASM] Optimizations
Posted: Tue 07 Jul, 2009 5:24 pm
by King Harold
Amazing that I never thought of this before, but in a multiplication you can actually stop after the operand that you are shifting out to test the bits becomes zero (not
when, but
after, very important difference) because you will never add anything to the result from that point onwards. It makes the loop slightly slower, but you will have an early exit in many cases - an early exit that often saves more cycles than it adds to the loop itself, and you won't need a loop counter (which is very cool when you're multiplying bigger things and you need all the registers you can get)
Proof of concept:
Code: Select all
DE_times_A:
ld hl,0
or a ;have to reset carry
_loop:
rra
jr nc,_skip
add hl,de
_skip:
sla e
rl d
or a ;slightly slower than the usual djnz
;as a bonus it will reset the carry, needed for the rra
jr nz,_loop ;4+12 vs 13
ret
Pro:
* early exit saves a lot of time for small value of A
* not used BC
* intro is 3 cc's faster (well, that's nothing..)
Con:
* slightly slower loop (3 cc's per iteration more)
Neither:
* exactly the same code size as the usual algorithm
disclaimer: I haven't slept much for a while due to the high temperature, it could be that I'm completely out of my mind. Please notify me if that's the case..
Re: [TI ASM] Optimizations
Posted: Wed 08 Jul, 2009 12:11 pm
by King Harold
So, now for a little cc analysis.
* one iteration of the old version takes 45 or 51 cc's (unless it is the last, then it's 5 less)
* (3*X)-3 cc's are added in the new version where X is the number of iterations
* 45*(8-X) cc's are saved where X is the number of iterations
* in the worst case, X=8 and 21 cycles are added.
* in the best case, X=1 (note: it can't be 0) it's 315-0=315 cc's faster
* the second-worst case is X=7: 45-18=27 cc's faster
* all timing differences: -315, -267, -219, -171, -123, -75, -27, 21
* the average (that is, 0.5 * 21 + 0.25 * 27 etc) is: -25.4 cc's
I apologize in advance for all errors that I will have made.
Why weren't we all using this way before? Or were we, but just not me?
Re: [TI ASM] Optimizations
Posted: Wed 08 Jul, 2009 1:23 pm
by tr1p1ea
Re: [TI ASM] Optimizations
Posted: Wed 08 Jul, 2009 1:27 pm
by King Harold
Hm ok, I compared it to:
Code: Select all
DE_Times_A: ; HL = DE × A
LD HL, 0 ; Use HL to store the product
LD B, 8 ; Eight bits to check
_loop:
RRCA ; Check least-significant bit of accumulator
JR NC, _skip ; If zero, skip addition
ADD HL, DE
_skip:
SLA E ; Shift DE one bit left
RL D
DJNZ _loop
RET
(asm in 28 days, day 15)
Re: [TI ASM] Optimizations
Posted: Wed 08 Jul, 2009 1:44 pm
by benryves
*
http://baze.au.com/misc/z80bits.html#1.1 I don't know why Google ranks that outdated .nl one higher than the original.