[TI-ASM] fast RLE routine

driesguldolf · Post by **driesguldolf** » Tue 28 Aug, 2007 12:31 pm

A while ago I needed a rle decomp routine, I made some simple stuff, but it quickly turned into a skill test for myself

Here's what I got:

; A fast horizontal RLE decompression routine, note: don't make your data end with a run
; Inputs are
;   a -> runflag,
;   bc -> size of compressed data,
;   de -> ptr to place to put decompressed data,
;   hl -> ptr to compressed data
; Run data is runflag, byte, size
; Destroyes flags, bc, de, hl (bc=0, de and hl point after their structures)
-- push bc       ; {1} [11]
   inc hl        ; {1} [6]
   ld c, (hl)    ; {1} [7]
   inc hl        ; {1} [6]
   ld b, (hl)    ; {1} [7]
   inc hl        ; {1} [6]
   ex de, hl     ; {1} [4]
-  ld (hl), c    ; {1} [7]
   inc hl        ; {1} [6]
   djnz {-}      ; {2} [8/13]
   ex de, hl     ; {1} [4]
   pop bc        ; {1} [10]
   dec bc        ; {1} [6]
   dec bc        ; {1} [6]
   dec bc        ; {1} [6]
decompress_hrle:
   cp (hl)       ; {1} [7]
   jp z, {--}    ; {3} [10]
   ldi           ; {2} [16]
   jp pe, decompress_hrle ; {3} [10]
   ret           ; {1} [10]

Total of 26 bytes
43 tstates for every (non-run) byte
91 tstates for every run
26 tstates for every (run) byte
10 tstates to return

Feel free to optimize

, tough I don't think there's much you can optimize

Post by **Dwedit** » Tue 28 Aug, 2007 5:38 pm

The one I've been using for a while: (run flag is fixed to $91, some SMC should take care of that)

Code: Select all

DispRLE:
    ld a, (hl)
    cp $91
    jr z, DispRLERun
    ldi
DispRLEC:
    ret po
    jr DispRLE
DispRLERun:
    inc hl
    inc hl
    ld a, (hl)
DispRLERunL:
    dec hl
    dec a
    ldi
    ret po
    jr nz, DispRLERunL
    inc hl
    jr DispRLEC

driesguldolf · Post by **driesguldolf** » Tue 28 Aug, 2007 7:32 pm

On a first look, it looks faster (The run decode part) tough I would change this:

Code: Select all

DispRLEC:
   ret po        ; [5/11]
   jr DispRLE     ; [12] (or [10] with jp)

in

Code: Select all

DispRLEC:
   jp pe, DispRLE    ; [10]
   ret     ; [10]

Saves 5/7 clocks per non-run byte (if you use jp)

Anyway, I don't think mine will ever be as fast as yours. You have A free to use and I don't. Guess it's a trade-of.

EDIT 1: on a second look: my routine will be faster with not so many (but large) runs. My routine cannot handle data ending with a run, but the workaround will only increases the data with one byte.

EDIT 2: on a third look: I've no idea wich one is the fastest, probabely you, 'cuz you are a far better programmer then me (or you're just a tad more famous then me

)

Timendus · Post by **Timendus** » Sun 09 Sep, 2007 2:07 pm

Just run them a few thousand times, then...

driesguldolf · Post by **driesguldolf** » Tue 11 Sep, 2007 7:57 pm

Well... A rle routine is supposed to be optimzed for size

I think they both have pluspoints and downsides: Mine's better when you really want to execute it from flash memory and you want to be able to change the runflag, but yours is smaller [21~22 bytes] if I dont mind having to use the same runflag all the time or if you execute it from RAM

EDIT: another downside of your routine is that you must know how large the uncompressed data is, though it shouldn't be hard to adopt it (I think)

But on the optimize for size part I'd get away from the classic model where the routine has to start on the first line:

Code: Select all

DispRLERun:
    inc hl
    inc hl
    ld a, (hl)
DispRLERunL:
    dec hl
    dec a
    ldi
    ret po
    jr nz, DispRLERunL
    inc hl
    ret po ; I'm not sure why you really want this instruction here
DispRLE:
    ld a, (hl)
    cp $91
    jr z, DispRLERun
    ldi
    ret po
    jr DispRLE

Saved 1 byte, if my assumptions are true then I saved another one

wow... I actually improved a staff member's code! I feel good, tadadadada

King Harold · Post by **King Harold** » Tue 11 Sep, 2007 8:10 pm

I think you can safely take out that RET PO that you aren't sure about, PO can not be true because you can only arrive at that point by passing through the RET PO above it, and there is nothing in between that can alter the P/V flag.

driesguldolf · Post by **driesguldolf** » Tue 11 Sep, 2007 8:21 pm

Yeah, I tought so, but I kept it becuase it is in the original code (and whatever a staff member says is said to be true...

)

kalan_vod · Post by **kalan_vod** » Tue 11 Sep, 2007 8:55 pm

driesguldolf wrote:Yeah, I tought so, but I kept it becuase it is in the original code (and whatever a staff member says is said to be true... )

Well, you have to realize that it may be out of practice (him not working with calcs much these days).

Post by **tr1p1ea** » Tue 11 Sep, 2007 10:43 pm

Dwedit didnt write that routine, its a very old one thats been getting around (note that he said he had been 'using' it). Its a routine that worked and no-one ever cared about optimising it. So dont question him again.

King Harold · Post by **King Harold** » Wed 12 Sep, 2007 6:12 pm

I've never seen dwedit make a mistake..

anyway,

driesgudolf wrote:EDIT: another downside of your routine is that you must know how large the uncompressed data is, though it shouldn't be hard to adopt it (I think)

You won't want to do that. That would make buffer-overflow far too easy to occur. Which is, IMO, more important than the 2 bytes you save by not having to store the uncompressed length.
I probably do not have to explain in how many ways buffer-overflow can be harmful.

edit: "length of the uncompressed data" is really "buffer length", if the data doesn't fit into it completely it's better to stop than to overflow the buffer.