[TI ASM] port $11, weirdness on wikiTI
Moderator: MaxCoderz Staff
-
- Calc King
- Posts: 1513
- Joined: Sat 05 Aug, 2006 7:22 am
the way I did it, it becomes quite crazy
the dark-gray part is much lighter in reality than on the screeny..
edit: but it's a bit dark, so this time with an other mask that has less dark and more light in it:
edit again: that was crap, 128 cc's between each write omg, now it's 85, looks a bit better:
the dark-gray part is much lighter in reality than on the screeny..
edit: but it's a bit dark, so this time with an other mask that has less dark and more light in it:
edit again: that was crap, 128 cc's between each write omg, now it's 85, looks a bit better:
-
- Calc King
- Posts: 1513
- Joined: Sat 05 Aug, 2006 7:22 am
a combination of masks..?
more than %11011011 and %00100100 and their rotated versions?
Now I'm rotating those masks every byte and an additional time every row, otherwise the rotating results in the same masks every row - which doesn't make any gray, but masked out lines.
What other masks should I use then? and where?
more than %11011011 and %00100100 and their rotated versions?
Now I'm rotating those masks every byte and an additional time every row, otherwise the rotating results in the same masks every row - which doesn't make any gray, but masked out lines.
What other masks should I use then? and where?
- tr1p1ea
- Maxcoderz Staff
- Posts: 4141
- Joined: Thu 16 Dec, 2004 10:06 pm
- Location: I cant seem to get out of this cryogenic chamber!
- Contact:
Well that diagonal rainy effect is caused by only using 1 mask. To reduce flicker you would say use %11011011 for one out to the lcd, then you could use %01101101 fo the next out, then %10110110 and repeat that. Then the next frame you would switch these masks around ... that usually takes care of the uniform rainy effect.
-
- Calc King
- Posts: 1513
- Joined: Sat 05 Aug, 2006 7:22 am
- Jim e
- Calc King
- Posts: 2457
- Joined: Sun 26 Dec, 2004 5:27 am
- Location: SXIOPO = Infinite lives for both players
- Contact:
Well I thought I'd find the guide I wrote in my sentbox but apparently it doesn't save to many messages. So heres a quick run down of interlacing
Lets say this is the image you want.
It would be composed of 2 layers commonly referred to as a dark layer and a light layer. The dark layer is generally stored before the light layer. So that image would look like this.
Dark:
Light:
For 4 level gray scale, the dark layer is displayed 2 times longer than the light. So if the light layer flashed once than the dark is twice.
This method is generally faster, the biggest reason is because you don't have to perform fastcopy again when the dark layer was shown prior. So in other words you can skip 1 lcd update. That save a tremendous amount of time. This method of updating the entire screen with a layer is also simpler to code and generally lighter.
However, if timing is inaccurate, the screen will look flickery and be quite unpleasant. If timing is accurate but not tuned properly, you'll end up seeing a line moving across the screen. Which is horribly noticeable.
Rigview came along with accurate controllable timing and a new method of displaying the screen. This was done by interlacing bytes from both layers.
Unlike the last screen you can actually tell even at this low frame rate whats suppose to be dark and whats suppose to be light. Even if the timing is slightly off the noise produced by that is spread out more evenly. So the effect is less painful. This however gets more bloated because you have to carry pointers to both layers and have a method to decide which layer gets used on what byte. Thats either by unrolling code or giving up some more registers.
I think it was Duck who decided to take it to the bit level, not sure on that. But there was a good deal of benefit from it. No matter how bad timing was dithering the bits together allowed for a very even image. The screen, no matter how noisy, was not unpleasant.
Of course with every advantage there is a disadvantage, the code is MUCH more complicated. Duck's code actually used shadow registers, so you can imagine the nightmare of having to work with that code. Big issue with it is that you have the annoyance of having to work with masks. So lets count the registers.
A 16bit pointer for the dark layer. How bout HL?
An 8bit mask for the dark layer. Ummm C?
A 8bit temporary storage place for the masked dark layer. B!
A 16bit pointer for the light layer. DE then
An 8bit mask for the light layer. ....uh
A 16bit value to add the offset to the next byte. oh SP!!!?
An 8bit loop counter. crap.
So in other words, this code would be register starved. You want each write to the lcd port to occur within ~70 tstates of the last write. Self Modifying Code works to an extent but is still not very fast.
Then some bright boy(I forget his name, had letters in it I swear) came up with a method that relied on reversible operations. What this did was get rid of the need of temporary storage and the need to hold both masks. (Kinda plays back into your xor swapping thing).
So this would look like:
Dark_layer ^ Light_layer & Dark_mask ^ Light_layer = result
As opposed to:
(Dark_layer & Dark_mask) | ( Light_layer & Light_mask ) = result
Just comparing the code is noticeable improvement.
This can really save you from requiring use of slower registers or shadow registers. Typically you can get WAY below the required lcd delay. So its extremely helpful.
So this is what I ended up using for RGP. It runs at fastcopy speed so I decided thats enough optimization. It requires that layers be stored next to each other, but that isn't to much of an issue.
For the masks, The dark layer should mask out 1/3 of its bits, the light layer should mask out 2/3 of its bits.
Lets say this is the image you want.
It would be composed of 2 layers commonly referred to as a dark layer and a light layer. The dark layer is generally stored before the light layer. So that image would look like this.
Dark:
Light:
For 4 level gray scale, the dark layer is displayed 2 times longer than the light. So if the light layer flashed once than the dark is twice.
This method is generally faster, the biggest reason is because you don't have to perform fastcopy again when the dark layer was shown prior. So in other words you can skip 1 lcd update. That save a tremendous amount of time. This method of updating the entire screen with a layer is also simpler to code and generally lighter.
However, if timing is inaccurate, the screen will look flickery and be quite unpleasant. If timing is accurate but not tuned properly, you'll end up seeing a line moving across the screen. Which is horribly noticeable.
Rigview came along with accurate controllable timing and a new method of displaying the screen. This was done by interlacing bytes from both layers.
Unlike the last screen you can actually tell even at this low frame rate whats suppose to be dark and whats suppose to be light. Even if the timing is slightly off the noise produced by that is spread out more evenly. So the effect is less painful. This however gets more bloated because you have to carry pointers to both layers and have a method to decide which layer gets used on what byte. Thats either by unrolling code or giving up some more registers.
I think it was Duck who decided to take it to the bit level, not sure on that. But there was a good deal of benefit from it. No matter how bad timing was dithering the bits together allowed for a very even image. The screen, no matter how noisy, was not unpleasant.
Of course with every advantage there is a disadvantage, the code is MUCH more complicated. Duck's code actually used shadow registers, so you can imagine the nightmare of having to work with that code. Big issue with it is that you have the annoyance of having to work with masks. So lets count the registers.
A 16bit pointer for the dark layer. How bout HL?
An 8bit mask for the dark layer. Ummm C?
A 8bit temporary storage place for the masked dark layer. B!
A 16bit pointer for the light layer. DE then
An 8bit mask for the light layer. ....uh
A 16bit value to add the offset to the next byte. oh SP!!!?
An 8bit loop counter. crap.
So in other words, this code would be register starved. You want each write to the lcd port to occur within ~70 tstates of the last write. Self Modifying Code works to an extent but is still not very fast.
Then some bright boy(I forget his name, had letters in it I swear) came up with a method that relied on reversible operations. What this did was get rid of the need of temporary storage and the need to hold both masks. (Kinda plays back into your xor swapping thing).
So this would look like:
Dark_layer ^ Light_layer & Dark_mask ^ Light_layer = result
As opposed to:
(Dark_layer & Dark_mask) | ( Light_layer & Light_mask ) = result
Just comparing the code is noticeable improvement.
Code: Select all
;Old broken code
ld a,(ix)
and d
ld c,a
ld a,(hl)
and e
or c
;New hotness
ld a,(de)
xor (hl)
and c
xor (hl)
So this is what I ended up using for RGP. It runs at fastcopy speed so I decided thats enough optimization. It requires that layers be stored next to each other, but that isn't to much of an issue.
Code: Select all
;-------------------------------------------------
;4 level Grey interlace routine
;by James Montelongo
lcd: ;52744
in a,($20)
push af
ld a,0
out ($20),a
ld (stacksave),sp
ld a,$80
out ($10),a
ld a,(gsmasknum)
inc a
cp 3
jr c,skipmaskswap
xor a
skipmaskswap:
ld (gsmasknum),a
ld e,a
ld d,0
ld hl,gsmasks
add hl,de
ld d,(hl)
inc hl ;accidentally deleted.
ld a,(hl)
cpl
ld e,a
ld hl,gsActivebuf1-12
ld sp,12
ld a,$20
ld c,a
colloop:
out ($10),a
ld b,32
rowloop:
add hl,sp
ld a,(hl)
inc h
inc h
inc h
xor (hl)
and d
xor (hl)
out ($11),a
add hl,sp
nop ;I actually need to delay.
ld a,(hl)
dec h
dec h
dec h
xor (hl)
and e
xor (hl)
out ($11),a
djnz rowloop
inc c
dec h
dec h
dec h
inc hl
ld a,c
cp $2c
jr nz,colloop
ld sp,(stacksave)
pop af
out ($20),a
ret
gsmasks:
.db %11011011
.db %10110110
.db %01101101
.db %11011011
For the masks, The dark layer should mask out 1/3 of its bits, the light layer should mask out 2/3 of its bits.
Last edited by Jim e on Mon 14 May, 2007 5:10 am, edited 1 time in total.
-
- Calc King
- Posts: 1513
- Joined: Sat 05 Aug, 2006 7:22 am
ok I get it now, thanks
Not to criticize you, but I would change
into
or similar to save 3 clocks
not that it matters, I'm just saying it so you all know I'm awake
same goes for the ld a,0
edit: it actually looks sortof OK now:
the dark area looks a bit odd on the screeny, it's not that bad on HW
Not to criticize you, but I would change
Code: Select all
ld d,(hl)
ld a,(hl)
Code: Select all
ld a,(hl)
ld d,a
not that it matters, I'm just saying it so you all know I'm awake
same goes for the ld a,0
edit: it actually looks sortof OK now:
the dark area looks a bit odd on the screeny, it's not that bad on HW
- tr1p1ea
- Maxcoderz Staff
- Posts: 4141
- Joined: Thu 16 Dec, 2004 10:06 pm
- Location: I cant seem to get out of this cryogenic chamber!
- Contact:
It was Tijl Coosemans (Kalimero) who first came up with bit level interlacing, Duck continued his work. This was pretty good and a typical routine (without unrolling) could get down to say 70cc's per write.
Then it was Johan Forslöf (doyanx) who came up with (A ^ B) & C ^ B = (A & C) | (B & ~C). Although his implementation was all over the place and it needed a non-standard buffer layout, this paved the way for an ordinary routine to be optimsed faster than fastcopy, which both myself and jim e ended up doing. My routine is a little different, but the core is essentially the same -- Jim it appears you did some things to avoid using smc (only 1 mask per update?)
Oh and is that only 63cc's between writes ... is that safe enough?
Then it was Johan Forslöf (doyanx) who came up with (A ^ B) & C ^ B = (A & C) | (B & ~C). Although his implementation was all over the place and it needed a non-standard buffer layout, this paved the way for an ordinary routine to be optimsed faster than fastcopy, which both myself and jim e ended up doing. My routine is a little different, but the core is essentially the same -- Jim it appears you did some things to avoid using smc (only 1 mask per update?)
Oh and is that only 63cc's between writes ... is that safe enough?
-
- Calc King
- Posts: 1513
- Joined: Sat 05 Aug, 2006 7:22 am