Page 2 of 3

Posted: Sat 12 May, 2007 3:39 pm
by King Harold
the way I did it, it becomes quite crazy
Image

the dark-gray part is much lighter in reality than on the screeny..

edit: but it's a bit dark, so this time with an other mask that has less dark and more light in it:
Image

edit again: that was crap, 128 cc's between each write omg, now it's 85, looks a bit better:
Image

Posted: Sat 12 May, 2007 4:24 pm
by tr1p1ea
You are only using 1 mask (and rotating that for each byte?) -- thats usually what causes the rainy effect. You could try using a combination of masks each loop.

Posted: Sat 12 May, 2007 4:31 pm
by King Harold
a combination of masks..?
more than %11011011 and %00100100 and their rotated versions?
Now I'm rotating those masks every byte and an additional time every row, otherwise the rotating results in the same masks every row - which doesn't make any gray, but masked out lines.

What other masks should I use then? and where?

Posted: Sat 12 May, 2007 4:46 pm
by tr1p1ea
Well that diagonal rainy effect is caused by only using 1 mask. To reduce flicker you would say use %11011011 for one out to the lcd, then you could use %01101101 fo the next out, then %10110110 and repeat that. Then the next frame you would switch these masks around ... that usually takes care of the uniform rainy effect.

Posted: Sat 12 May, 2007 7:29 pm
by King Harold
I can't do it yet :(
Maybe tomorrow, I feel a bit braindead at the moment..

Posted: Sun 13 May, 2007 5:36 am
by Jim e
Well I thought I'd find the guide I wrote in my sentbox but apparently it doesn't save to many messages. So heres a quick run down of interlacing


Lets say this is the image you want.

Image


It would be composed of 2 layers commonly referred to as a dark layer and a light layer. The dark layer is generally stored before the light layer. So that image would look like this.

Dark:
Image

Light:
Image

For 4 level gray scale, the dark layer is displayed 2 times longer than the light. So if the light layer flashed once than the dark is twice.

Image


This method is generally faster, the biggest reason is because you don't have to perform fastcopy again when the dark layer was shown prior. So in other words you can skip 1 lcd update. That save a tremendous amount of time. This method of updating the entire screen with a layer is also simpler to code and generally lighter.

However, if timing is inaccurate, the screen will look flickery and be quite unpleasant. If timing is accurate but not tuned properly, you'll end up seeing a line moving across the screen. Which is horribly noticeable.


Rigview came along with accurate controllable timing and a new method of displaying the screen. This was done by interlacing bytes from both layers.
Image
Unlike the last screen you can actually tell even at this low frame rate whats suppose to be dark and whats suppose to be light. Even if the timing is slightly off the noise produced by that is spread out more evenly. So the effect is less painful. This however gets more bloated because you have to carry pointers to both layers and have a method to decide which layer gets used on what byte. Thats either by unrolling code or giving up some more registers.

I think it was Duck who decided to take it to the bit level, not sure on that. But there was a good deal of benefit from it. No matter how bad timing was dithering the bits together allowed for a very even image. The screen, no matter how noisy, was not unpleasant.

Image

Of course with every advantage there is a disadvantage, the code is MUCH more complicated. Duck's code actually used shadow registers, so you can imagine the nightmare of having to work with that code. Big issue with it is that you have the annoyance of having to work with masks. So lets count the registers.

A 16bit pointer for the dark layer. How bout HL?
An 8bit mask for the dark layer. Ummm C?
A 8bit temporary storage place for the masked dark layer. B!
A 16bit pointer for the light layer. DE then
An 8bit mask for the light layer. ....uh
A 16bit value to add the offset to the next byte. oh SP!!!?
An 8bit loop counter. crap.

So in other words, this code would be register starved. You want each write to the lcd port to occur within ~70 tstates of the last write. Self Modifying Code works to an extent but is still not very fast.

Then some bright boy(I forget his name, had letters in it I swear) came up with a method that relied on reversible operations. What this did was get rid of the need of temporary storage and the need to hold both masks. (Kinda plays back into your xor swapping thing).

So this would look like:

Dark_layer ^ Light_layer & Dark_mask ^ Light_layer = result

As opposed to:
(Dark_layer & Dark_mask) | ( Light_layer & Light_mask ) = result

Just comparing the code is noticeable improvement.

Code: Select all

;Old broken code
	ld a,(ix)
	and d
	ld c,a
	ld a,(hl)
	and e
	or c

;New hotness
	ld a,(de)
	xor (hl)
	and c
	xor (hl)
This can really save you from requiring use of slower registers or shadow registers. Typically you can get WAY below the required lcd delay. So its extremely helpful.


So this is what I ended up using for RGP. It runs at fastcopy speed so I decided thats enough optimization. It requires that layers be stored next to each other, but that isn't to much of an issue.

Code: Select all

;-------------------------------------------------
;4 level Grey interlace routine
;by James Montelongo 
lcd:					;52744
	in a,($20)
	push af
	ld a,0
	out ($20),a
	ld (stacksave),sp
	ld a,$80
	out ($10),a
	ld a,(gsmasknum)
	inc a
	cp 3
	jr c,skipmaskswap
	xor a
skipmaskswap:
	ld (gsmasknum),a
	ld e,a
	ld d,0
	ld hl,gsmasks
	add hl,de
	ld d,(hl)
	inc hl		;accidentally deleted.
	ld a,(hl)
	cpl
	ld e,a
	ld hl,gsActivebuf1-12
	ld sp,12
	ld a,$20
	ld c,a
colloop:

	out ($10),a
	ld b,32
rowloop:
	add hl,sp
	ld a,(hl)
	inc h
	inc h
	inc h
	xor (hl)
	and d
	xor (hl)
	out ($11),a
	add hl,sp
	nop		;I actually need to delay.
	ld a,(hl)
	dec h
	dec h
	dec h
	xor (hl)
	and e
	xor (hl)
	out ($11),a
	djnz rowloop
	inc c
	dec h
	dec h
	dec h
	inc hl
	ld a,c
	cp $2c
	jr nz,colloop
	ld sp,(stacksave)
	pop af
	out ($20),a
	ret


gsmasks:
 .db %11011011
 .db %10110110
 .db %01101101
 .db %11011011


For the masks, The dark layer should mask out 1/3 of its bits, the light layer should mask out 2/3 of its bits.

Posted: Sun 13 May, 2007 8:58 am
by King Harold
ok I get it now, thanks :)

Not to criticize you, but I would change

Code: Select all

ld d,(hl)
ld a,(hl)
into

Code: Select all

ld a,(hl)
ld d,a
or similar to save 3 clocks
not that it matters, I'm just saying it so you all know I'm awake :P
same goes for the ld a,0

edit: it actually looks sortof OK now:
Image
the dark area looks a bit odd on the screeny, it's not that bad on HW

Posted: Sun 13 May, 2007 10:21 am
by tr1p1ea
It was Tijl Coosemans (Kalimero) who first came up with bit level interlacing, Duck continued his work. This was pretty good and a typical routine (without unrolling) could get down to say 70cc's per write.

Then it was Johan Forslöf (doyanx) who came up with (A ^ B) & C ^ B = (A & C) | (B & ~C). Although his implementation was all over the place and it needed a non-standard buffer layout, this paved the way for an ordinary routine to be optimsed faster than fastcopy, which both myself and jim e ended up doing. My routine is a little different, but the core is essentially the same -- Jim it appears you did some things to avoid using smc (only 1 mask per update?)

Oh and is that only 63cc's between writes ... is that safe enough?

Posted: Sun 13 May, 2007 10:43 am
by Jim e
King Harold wrote:or similar to save 3 clocks
Sweet, I can save half a microsecond. At 70fps, running for oh say 8 hours, I'll save bout 1 second.

Meh...I'd probably waste that second of my life anyway.

Posted: Sun 13 May, 2007 12:08 pm
by King Harold
not that it matters, I'm just saying it so you all know I'm awake
let's not start a discussion about that, ok?
it doesn't hurt to save those 3 clocks and a byte anyway..

ok, lets continue with the grayscale:
is it normal that it gets weird like that on screeny's? can anything be done about it?

Posted: Sun 13 May, 2007 6:54 pm
by tr1p1ea
The uniform noise is due to the fact that the routine only uses 1 mask per lcd update. To reduce this you can use all 3 masks per lcd update -- this would require expansion of the routine. I think jim has done it that way to avoid the need for smc.

Posted: Sun 13 May, 2007 11:06 pm
by Jim e
I use 2 masks to break the uniform look. 3 masks would be best but 2 is enough. You could also rotate the mask circularly after each write.

I also use the 3 inc\dec to save de from pointer use. If I didn't I would have to add de with sp which would waste 15 clocks as opposed to 12.

Posted: Mon 14 May, 2007 3:46 am
by tr1p1ea
So ... that is not the routine you use in RGP then? I only see 1 mask being used (used twice).

Posted: Mon 14 May, 2007 5:09 am
by Jim e
whoops I accidentally deleted an inc hl

There were some ifdefs there that weren't relevant.

Posted: Mon 14 May, 2007 7:45 am
by tr1p1ea
Ahh ok, i thought that was the case, since your mask table was padded.