[TI ASM] port $11, weirdness on wikiTI

King Harold · Post by **King Harold** » Sat 12 May, 2007 3:39 pm

the way I did it, it becomes quite crazy

the dark-gray part is much lighter in reality than on the screeny..

edit: but it's a bit dark, so this time with an other mask that has less dark and more light in it:

edit again: that was crap, 128 cc's between each write omg, now it's 85, looks a bit better:

Post by **tr1p1ea** » Sat 12 May, 2007 4:24 pm

You are only using 1 mask (and rotating that for each byte?) -- thats usually what causes the rainy effect. You could try using a combination of masks each loop.

King Harold · Post by **King Harold** » Sat 12 May, 2007 4:31 pm

a combination of masks..?
more than %11011011 and %00100100 and their rotated versions?
Now I'm rotating those masks every byte and an additional time every row, otherwise the rotating results in the same masks every row - which doesn't make any gray, but masked out lines.

What other masks should I use then? and where?

Post by **tr1p1ea** » Sat 12 May, 2007 4:46 pm

Well that diagonal rainy effect is caused by only using 1 mask. To reduce flicker you would say use %11011011 for one out to the lcd, then you could use %01101101 fo the next out, then %10110110 and repeat that. Then the next frame you would switch these masks around ... that usually takes care of the uniform rainy effect.

King Harold · Post by **King Harold** » Sat 12 May, 2007 7:29 pm

I can't do it yet

Maybe tomorrow, I feel a bit braindead at the moment..

Jim e · Post by **Jim e** » Sun 13 May, 2007 5:36 am

Well I thought I'd find the guide I wrote in my sentbox but apparently it doesn't save to many messages. So heres a quick run down of interlacing

Lets say this is the image you want.

It would be composed of 2 layers commonly referred to as a dark layer and a light layer. The dark layer is generally stored before the light layer. So that image would look like this.

Dark:

Light:

For 4 level gray scale, the dark layer is displayed 2 times longer than the light. So if the light layer flashed once than the dark is twice.

This method is generally faster, the biggest reason is because you don't have to perform fastcopy again when the dark layer was shown prior. So in other words you can skip 1 lcd update. That save a tremendous amount of time. This method of updating the entire screen with a layer is also simpler to code and generally lighter.

However, if timing is inaccurate, the screen will look flickery and be quite unpleasant. If timing is accurate but not tuned properly, you'll end up seeing a line moving across the screen. Which is horribly noticeable.

Rigview came along with accurate controllable timing and a new method of displaying the screen. This was done by interlacing bytes from both layers.

Unlike the last screen you can actually tell even at this low frame rate whats suppose to be dark and whats suppose to be light. Even if the timing is slightly off the noise produced by that is spread out more evenly. So the effect is less painful. This however gets more bloated because you have to carry pointers to both layers and have a method to decide which layer gets used on what byte. Thats either by unrolling code or giving up some more registers.

I think it was Duck who decided to take it to the bit level, not sure on that. But there was a good deal of benefit from it. No matter how bad timing was dithering the bits together allowed for a very even image. The screen, no matter how noisy, was not unpleasant.

Of course with every advantage there is a disadvantage, the code is MUCH more complicated. Duck's code actually used shadow registers, so you can imagine the nightmare of having to work with that code. Big issue with it is that you have the annoyance of having to work with masks. So lets count the registers.

A 16bit pointer for the dark layer. How bout HL?
An 8bit mask for the dark layer. Ummm C?
A 8bit temporary storage place for the masked dark layer. B!
A 16bit pointer for the light layer. DE then
An 8bit mask for the light layer. ....uh
A 16bit value to add the offset to the next byte. oh SP!!!?
An 8bit loop counter. crap.

So in other words, this code would be register starved. You want each write to the lcd port to occur within ~70 tstates of the last write. Self Modifying Code works to an extent but is still not very fast.

Then some bright boy(I forget his name, had letters in it I swear) came up with a method that relied on reversible operations. What this did was get rid of the need of temporary storage and the need to hold both masks. (Kinda plays back into your xor swapping thing).

So this would look like:

Dark_layer ^ Light_layer & Dark_mask ^ Light_layer = result

As opposed to:
(Dark_layer & Dark_mask) | ( Light_layer & Light_mask ) = result

Just comparing the code is noticeable improvement.

Code: Select all

;Old broken code
	ld a,(ix)
	and d
	ld c,a
	ld a,(hl)
	and e
	or c

;New hotness
	ld a,(de)
	xor (hl)
	and c
	xor (hl)

This can really save you from requiring use of slower registers or shadow registers. Typically you can get WAY below the required lcd delay. So its extremely helpful.

So this is what I ended up using for RGP. It runs at fastcopy speed so I decided thats enough optimization. It requires that layers be stored next to each other, but that isn't to much of an issue.

Code: Select all

;-------------------------------------------------
;4 level Grey interlace routine
;by James Montelongo 
lcd:					;52744
	in a,($20)
	push af
	ld a,0
	out ($20),a
	ld (stacksave),sp
	ld a,$80
	out ($10),a
	ld a,(gsmasknum)
	inc a
	cp 3
	jr c,skipmaskswap
	xor a
skipmaskswap:
	ld (gsmasknum),a
	ld e,a
	ld d,0
	ld hl,gsmasks
	add hl,de
	ld d,(hl)
	inc hl		;accidentally deleted.
	ld a,(hl)
	cpl
	ld e,a
	ld hl,gsActivebuf1-12
	ld sp,12
	ld a,$20
	ld c,a
colloop:

	out ($10),a
	ld b,32
rowloop:
	add hl,sp
	ld a,(hl)
	inc h
	inc h
	inc h
	xor (hl)
	and d
	xor (hl)
	out ($11),a
	add hl,sp
	nop		;I actually need to delay.
	ld a,(hl)
	dec h
	dec h
	dec h
	xor (hl)
	and e
	xor (hl)
	out ($11),a
	djnz rowloop
	inc c
	dec h
	dec h
	dec h
	inc hl
	ld a,c
	cp $2c
	jr nz,colloop
	ld sp,(stacksave)
	pop af
	out ($20),a
	ret


gsmasks:
 .db %11011011
 .db %10110110
 .db %01101101
 .db %11011011

For the masks, The dark layer should mask out 1/3 of its bits, the light layer should mask out 2/3 of its bits.

King Harold · Post by **King Harold** » Sun 13 May, 2007 8:58 am

ok I get it now, thanks

Not to criticize you, but I would change

Code: Select all

ld d,(hl)
ld a,(hl)

into

Code: Select all

ld a,(hl)
ld d,a

or similar to save 3 clocks
not that it matters, I'm just saying it so you all know I'm awake

same goes for the ld a,0

edit: it actually looks sortof OK now:

the dark area looks a bit odd on the screeny, it's not that bad on HW

Post by **tr1p1ea** » Sun 13 May, 2007 10:21 am

It was Tijl Coosemans (Kalimero) who first came up with bit level interlacing, Duck continued his work. This was pretty good and a typical routine (without unrolling) could get down to say 70cc's per write.

Then it was Johan ForslÃƒÂ¶f (doyanx) who came up with (A ^ B) & C ^ B = (A & C) | (B & ~C). Although his implementation was all over the place and it needed a non-standard buffer layout, this paved the way for an ordinary routine to be optimsed faster than fastcopy, which both myself and jim e ended up doing. My routine is a little different, but the core is essentially the same -- Jim it appears you did some things to avoid using smc (only 1 mask per update?)

Oh and is that only 63cc's between writes ... is that safe enough?

Jim e · Post by **Jim e** » Sun 13 May, 2007 10:43 am

King Harold wrote:or similar to save 3 clocks

Sweet, I can save half a microsecond. At 70fps, running for oh say 8 hours, I'll save bout 1 second.

Meh...I'd probably waste that second of my life anyway.

King Harold · Post by **King Harold** » Sun 13 May, 2007 12:08 pm

not that it matters, I'm just saying it so you all know I'm awake

let's not start a discussion about that, ok?
it doesn't hurt to save those 3 clocks and a byte anyway..

ok, lets continue with the grayscale:
is it normal that it gets weird like that on screeny's? can anything be done about it?

Post by **tr1p1ea** » Sun 13 May, 2007 6:54 pm

The uniform noise is due to the fact that the routine only uses 1 mask per lcd update. To reduce this you can use all 3 masks per lcd update -- this would require expansion of the routine. I think jim has done it that way to avoid the need for smc.

Jim e · Post by **Jim e** » Sun 13 May, 2007 11:06 pm

I use 2 masks to break the uniform look. 3 masks would be best but 2 is enough. You could also rotate the mask circularly after each write.

I also use the 3 inc\dec to save de from pointer use. If I didn't I would have to add de with sp which would waste 15 clocks as opposed to 12.

Post by **tr1p1ea** » Mon 14 May, 2007 3:46 am

So ... that is not the routine you use in RGP then? I only see 1 mask being used (used twice).

Jim e · Post by **Jim e** » Mon 14 May, 2007 5:09 am

whoops I accidentally deleted an inc hl

There were some ifdefs there that weren't relevant.

Post by **tr1p1ea** » Mon 14 May, 2007 7:45 am

Ahh ok, i thought that was the case, since your mask table was padded.