- Integer square root.
- Unsigned division.
- Hexidecimal coversion and flags display.
- Debugging with ASSERT.
- Proper fractions to fixed-point values (16-bit).
- Proper fractions to fixed-point values (8-bit).
- 8-bit unsigned x 8-bit unsigned multiplication.
- 8-bit signed x 8-bit signed multiplication.
- 8-bit signed x 8-bit unsigned multiplication.
- Fast line drawing.
- Multiplication by a proper fraction (HL = HL * DE / BC).
[TI ASM] Useful routines.
Moderator: MaxCoderz Staff
[TI ASM] Useful routines.
INDEX:
Last edited by qarnos on Fri 07 Dec, 2007 9:29 am, edited 4 times in total.
Square Root
This routine, as the title suggests, will calculate the integer square root of the value in BC. This has been ripped straight from my 3D engine, so the naming reflects the naming used in the engine (every routine is prefixed by "al").
The value passed in A will determine how many bits of the result are calculated. For normal purposes, this should be 8, but you can get extra precission by using higher values. If you want the result as an 8.4 fixed point number, then you would pass 12 as the precision parameter.
UPDATE: I mentioned a possible bug when specifying 16 bits of precission. I checked it out, and removing the bug would really hamper the performance of the routine, so I decided to let it be.
If you really, really need an 8.8 fixed point result, then specify 15 bits of precission and double the result. There will be a margin of error in this case - but it avoids the 16 bit problem.
Passing a value lower than 8 as the precission parameter will result in a truncated answer.
In C code, this routine performs:
DE = int(sqrt(BC) / 256 * (2 ^ A));
One final note, be aware this code uses the TASM .MODULE directive to define the scope of local labels.
To reset the scope, you will need to add a blank .MODULE after the routine. Otherwise, the TI-OS routine names, which all start with an underscore, may not be in scope for the rest of your program.
The value passed in A will determine how many bits of the result are calculated. For normal purposes, this should be 8, but you can get extra precission by using higher values. If you want the result as an 8.4 fixed point number, then you would pass 12 as the precision parameter.
UPDATE: I mentioned a possible bug when specifying 16 bits of precission. I checked it out, and removing the bug would really hamper the performance of the routine, so I decided to let it be.
If you really, really need an 8.8 fixed point result, then specify 15 bits of precission and double the result. There will be a margin of error in this case - but it avoids the 16 bit problem.
Passing a value lower than 8 as the precission parameter will result in a truncated answer.
In C code, this routine performs:
DE = int(sqrt(BC) / 256 * (2 ^ A));
Code: Select all
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alSquareRootBC:
;;
;; Calculates the integer square root of BC. This is the highest number, X,
;; which satisfies: X * X <= BC.
;;
;; INPUTS:
;; A - The number of bits of precision required in the result.
;; BC - The number to obtain the square root of.
;;
;; OUTPUTS:
;; DE - Integer square root of BC
;;
;; DESTROYED:
;; AF, BC, HL
;;
;; TIMINGS (not including CALL):
;; 44 + (133 * A) T-States
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alSquareRootBC: .MODULE alSquareRootBC
ld hl, $0000 ; [10]
ld d, h ; [4]
ld e, l ; [4]
_mainLoop: sla c ; [8]
rl b ; [8]
adc hl, hl ; [15]
sla c ; [8]
rl b ; [8]
adc hl, hl ; [15]
sll e ; [8] root = root * 2 + 1
rl d ; [8] should never carry
sbc hl, de ; [15] remainder -= root
jp c, _revertHL ; [10]
inc de ; [6]
jp _loopTail ; [10]
_revertHL: add hl, de ; [11]
dec e ; [4] root -= 1 (low bit always set).
_loopTail: dec a ; [4]
jp nz, _mainLoop ; [10]
srl d ; [8]
rr e ; [8]
ret ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alSquareRootBC
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
To reset the scope, you will need to add a blank .MODULE after the routine. Otherwise, the TI-OS routine names, which all start with an underscore, may not be in scope for the rest of your program.
Last edited by qarnos on Fri 18 Aug, 2006 9:11 am, edited 1 time in total.
Unsigned division.
Here is a routine to perform unsigned 16-bit / 16-bit division.
It has been a long time since I wrote this routine, so it could quite possibly be optimised a bit further.
C code: HL = int(HL / BC);
It has been a long time since I wrote this routine, so it could quite possibly be optimised a bit further.
C code: HL = int(HL / BC);
Code: Select all
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alUDivHLBC
;;
;; Unsigned divide of HL by BC. Returns the integer (quotient) and remainder
;; of the division.
;;
;; INPUTS:
;; HL - Numerator
;; BC - Denominator
;;
;; OUTPUTS:
;; HL - Quotient of (HL/BC)
;; DE - Remainder of (HL/BC)
;; BC - Denominator (unchanged)
;;
;; DESTROYED:
;; AF
;;
;; TIMINGS (not including CALL):
;; Best-case (small numerators): 518 T-states
;; Worst-case (large numerators): 1684 T-states
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alUDivHLBC: .MODULE alUDivHLBC
ld de, 0 ; [10] Zero the initial remainder (DE) and
ld a, 17 ; [7] set up loop counter (A).
_shlHL: dec a ; [4] This initial loop shifts HL left until
add hl, hl ; [11] the high bit pops into the carry.
jp nc, _shlHL ; [10]
jp _chkQ ; [10]
_divLp: ex de, hl ; [4]
adc hl, hl ; [15]
_chkQ: ex de, hl ; [4]
adc hl, hl ; [15] Shift bit into remainder
dec a ; [4]
jp z, _endDv ; [10]
sbc hl, bc ; [15]
ccf ; [4]
jp c, _divLp ; [10]
_addBC: add hl, bc ; [11]
or a ; [4]
jp _divLp ; [10]
_endDv: sbc hl, bc ; [15]
jp nc, _endW1 ; [10]
add hl, bc ; [11]
_endW1: ex de, hl ; [4]
ccf ; [4]
adc hl, hl ; [15]
ret ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alUDivHLBC
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Ok, I've already decided to break my one-routine per post rule
Here are a few routines which form part of some debugging code I am putting together.
These routines deal with displaying hexidecimal numbers and the flags register.
A quick overview:
* dbgNibbleToHex - Converts a 4-bit value in the accumulator to the ASCII code of the hexidecimal representation.
* dbgPrintFlags - Displays the contents of F in an easy to read format.
* dbgPrintHexByte - Displays a byte in hexidecimal format.
* dbgPrintHexWord - Displays a word in hexidecimal format.
Here are a few routines which form part of some debugging code I am putting together.
These routines deal with displaying hexidecimal numbers and the flags register.
A quick overview:
* dbgNibbleToHex - Converts a 4-bit value in the accumulator to the ASCII code of the hexidecimal representation.
* dbgPrintFlags - Displays the contents of F in an easy to read format.
* dbgPrintHexByte - Displays a byte in hexidecimal format.
* dbgPrintHexWord - Displays a word in hexidecimal format.
Code: Select all
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; dbgPrintHexByte
;; dbgPrintHexWord
;;
;; Uses _VPutMap to display a byte or word value in hexidecimal.
;;
;; INPUTS:
;; HL - Address of byte/word to display
;;
;; DESTROYED:
;; AF, DE
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
dbgPrintHexWord:
inc hl ; MSB first.
call dbgPrintHexByte
dec hl
dbgPrintHexByte:
xor a
rld
push af
call dbgNibbleToHex
BCALL _VPutMap
pop af
rld
push af
call dbgNibbleToHex
BCALL _VPutMap
pop af
rld
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of dbgPrintByte / dbgPrintWord
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; dbgPrintFlags:
;;
;; Uses _VPutMap to display the contents of the flags register.
;; The output is formatted SZ5H3PNC with a dash (-) in place of the flag
;; symbol if the corresponding flag is not set.
;;
;; INPUTS:
;; F - Well...
;;
;; DESTROYED:
;; AF, BC, DE
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
dbgPrintFlags:
push af
pop bc
ld b, 'S' - 45 ; 45 is the ASCII code for a dash.
call _dbgPutFlg
ld b, 'Z' - 45
call _dbgPutFlg
ld b, '5' - 45
call _dbgPutFlg
ld b, 'H' - 45
call _dbgPutFlg
ld b, '3' - 45
call _dbgPutFlg
ld b, 'P' - 45
call _dbgPutFlg
ld b, 'N' - 45
call _dbgPutFlg
ld b, 'C' - 45
_dbgPutFlg: sla c
sbc a, a ; A = 255 if [sla c] carried.
and b ; AND with (character - 45)
add a, 45 ; then add 45.
BCALL _VPutMap
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of dbgPrintFlags
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; dbgNibbleToHex:
;;
;; Converts a 4-bit value to the ASCII representation of that number in
;; hexadecimal format.
;;
;; INPUTS:
;; A - Number to transform (low 4 bits).
;;
;; OUTPUTS:
;; A - ASCII hex code.
;;
;; DESTROYED:
;; F
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
dbgNibbleToHex:
and $0f ; mask the low 4 bits to be sure.
add a, 48 ; ASCII: 48-57 = numerals [0..9]
cp 58 ; if A was > 9 then add 7 to get
ret c ; the ASCII codes for letters A-F.
add a, 7
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of dbgNibbleToHex
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Debugging code
Expanding on my last post, I have completed writing a simple ASSERT library.
For those unfarmiliar with C/C++, ASSERT is a macro which is used during debugging to test the validity of an assumption made by the programmer. If the ASSERT fails, then the program is aborted and the debugger will identify the exact place where things went awry.
This library implements a version of the ASSERT macro. This can be placed at any point in your program and takes just one argument - the condition to test. This can be any of the conditions used in a JP instruction (c, nc, z, nz, etc). If the test FAILS the program is terminated with a register dump.
The square root routine I provided a few posts ago is a good example of when to use ASSERT. Take a look at this fragment:
Here, I have made the assumption that RL D will never set the carry flag, which is crucial for the following SBC instruction. I could now "enforce" that assumption:
The macro effectively says, "I am assuming that the carry flag will be reset at this point. If it is not, then that is really bad and I would like to know about it, so end the program".
If the ASSERT does fail, I would get a screen looking something like this:
One important thing to note, is that the ASSERT macros only have an effect when DEBUG is defined. If it is undefined, the ASSERTs expand to nothing!
The two files you will need are debug.asm and debug.inc.
NOTE: There files were written for the TI-83+. They may need modification for other calcs.
Since the are no instructions, apart from the source code comments, I will give a crappy example here.
One final warning - ION.INC doesn't include all the TIOS routines. Some which I needed were missing so I put them at the top of debug.asm. If you are using a different header, they might already be define so you will need to remove them.
Also, the equates I put in were the 83+ versions only.
For those unfarmiliar with C/C++, ASSERT is a macro which is used during debugging to test the validity of an assumption made by the programmer. If the ASSERT fails, then the program is aborted and the debugger will identify the exact place where things went awry.
This library implements a version of the ASSERT macro. This can be placed at any point in your program and takes just one argument - the condition to test. This can be any of the conditions used in a JP instruction (c, nc, z, nz, etc). If the test FAILS the program is terminated with a register dump.
The square root routine I provided a few posts ago is a good example of when to use ASSERT. Take a look at this fragment:
Code: Select all
sll e ; [8] root = root * 2 + 1
rl d ; [8] should never carry
sbc hl, de ; [15] remainder -= root
Code: Select all
sll e ; [8] root = root * 2 + 1
rl d ; [8] should never carry
ASSERT(nc)
sbc hl, de ; [15] remainder -= root
If the ASSERT does fail, I would get a screen looking something like this:
One important thing to note, is that the ASSERT macros only have an effect when DEBUG is defined. If it is undefined, the ASSERTs expand to nothing!
The two files you will need are debug.asm and debug.inc.
NOTE: There files were written for the TI-83+. They may need modification for other calcs.
Since the are no instructions, apart from the source code comments, I will give a crappy example here.
Code: Select all
#define DEBUG ; do this *before* including debug.inc
#include "ion.inc"
#include "debug.inc"
; etc...
ionBegin:
DEBUG_BEGIN_PROGRAM ; DO THIS FIRST!!!!!!!!!!!!!!!
add a, c
ASSERT(nc) ; make sure a + c did not carry
; etc...
putPixel: ; co-ordinates in B and C
#ifdef DEBUG
ld a, 95
cp b
ASSERT(nc) ; make sure B <= 95
ld a, 63
cp c ; make sure C <= 63
ASSERT(nc)
#endif
; etc...
#include "debug.asm" ; include debug code
Also, the equates I put in were the 83+ versions only.
Last edited by qarnos on Wed 30 Apr, 2008 10:11 am, edited 1 time in total.
Fractions to fixed point values.
Here is a handy piece of code which can convert a proper fractional value (such as 3/7, 5/9, etc) into a fixed point representation of that value.
This can be used, for instance, to convert the remainder of a division into something more useful.
The number of bits calculated can be defined to suit whatever purpose you need it for.
The longest possible run time (when calculating a full 16 bit result) is around 1400 T-States.
NOTE: This code uses the undocument Z80 instruction SL1. Unfortunately, most assemblers get this wrong and use the mnemonic SLL instead (due to confusion between a lowercase L and the numeral 1). This code gives in and uses the SLL mnemonic. If using TASM, remember to specify the -x5 switch to enable all the undocumented instructions.
If you get any "unkown instruction" errors, try changing SLL to SL1
[quote][/quote]
This can be used, for instance, to convert the remainder of a division into something more useful.
The number of bits calculated can be defined to suit whatever purpose you need it for.
The longest possible run time (when calculating a full 16 bit result) is around 1400 T-States.
NOTE: This code uses the undocument Z80 instruction SL1. Unfortunately, most assemblers get this wrong and use the mnemonic SLL instead (due to confusion between a lowercase L and the numeral 1). This code gives in and uses the SLL mnemonic. If using TASM, remember to specify the -x5 switch to enable all the undocumented instructions.
If you get any "unkown instruction" errors, try changing SLL to SL1
Code: Select all
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; FractionToFix16:
;;
;; Converts a proper fraction in the form HL / DE into a 0.x fixed point
;; representation where x is defined by the caller.
;;
;; Note that this routine will only provide correct results for proper
;; fractions (denominator > numerator). Anything else will return with
;; each bit of the result set.
;;
;; The value passed in BC determines the number of bits of accuracy
;; that are calculated and should equal (1 << (16 - nBits)).
;;
;; For 16 bits: $0001
;; For 8 bits: $0100
;;
;; INPUTS:
;; HL - Numerator
;; DE - Denominator
;; BC - Accuracy (1 << (16 - nBits))
;;
;; OUTPUTS:
;; BC - HL / DE
;;
;; DESTROYED:
;; F, HL
;;
;; TIMINGS (not including CALL):
;; Best case: 10 + (68 * nBits) T-States
;; Worst case: 10 + (87 * nBits) T-States
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
FractionToFix16: .MODULE FractionToFix16
add hl, hl ; [11]
jr c, _overFlow ; [12/7]
sbc hl, de ; [15]
jp nc, _shift1 ; [12/7]
add hl, de ; [11]
sla c ; [8]
rl b ; [8]
jp nc, FractionToFix16 ; [10]
ret ; [10]
_overFlow: or a ; [4]
sbc hl, de ; [15]
_shift1: sll c ; [8]
rl b ; [8]
jp nc, FractionToFix16 ; [10]
ret ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of FractionToFix16
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
This is just an 8-bit version of the above routine.
The same warnings about SLL apply.
The same warnings about SLL apply.
Code: Select all
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; FractionToFix8:
;;
;; An 8-bit version of FractionToFix16
;;
;; INPUTS:
;; A - Numerator
;; B - Denominator
;; C - Accuracy
;;
;; OUTPUTS:
;; C - 0.x fixed point representation of A / B
;;
;; DESTROYED:
;; AF
;;
;; TIMINGS (not including CALL):
;; Best case: 10 + (38 * nBits) T-States
;; Worst case: 10 + (45 * nBits) T-States
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
FractionToFix8: .MODULE FractionToFix8
add a, a ; [4]
jr c, _overFlow ; [12/7]
sub b ; [4]
jr nc, _shift1 ; [12/7]
add a, b ; [4]
sla c ; [8]
jp nc, FractionToFix8 ; [10]
ret ; [10]
_overFlow: sub b ; [4]
_shift1: sll c ; [8]
jp nc, FractionToFix8 ; [10]
ret ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of FractionToFix8
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Fast 8-bit multiplication: unsigned x unsigned
I have been debating whether or not to post these routines, since I am unhappy with the way the signed variants work, but I decided to post them now and I can always update them if neccessary.
The next few posts will detail some very fast 8-bit x 8-bit multiplication routines. These routines all work using the table of squares method. For those unfamiliar with this approach, here is a quick overview:
We want to calculate A * B. First, we define a function, f, which is defined as f(x) = x * x / 4
Now behold! f(A + B) - f(A - B) = A * B
So, we can precalculate f(x) for the range we will required and store this data in a lookup table (LUT).
For two unsigned 8-bit numbers, the range of values we need to know the value of f(x) for is limited to -255 (0 - 255) to 510 (255 + 255). Since f(510) = 65025, we will need 2 bytes for each entry in the LUT, requiring 1024 bytes of LUT data. This may be excessive for some applications - it depends on whether you can afford the RAM.
Furthermore, this table will need to be aligned to a 256 byte boundary to achieve maximum speed.
So, first, I will give you the table:
EDIT: I also have some code to generate this table at runtime. If you would like this code, PM me. Note that the code doesn't handle the memory allocation - it just generates the table where you tell it to.
And now the code that uses this table to perform an unsigned x unsigned multiplication:
Large, but fast!
The next few posts will detail some very fast 8-bit x 8-bit multiplication routines. These routines all work using the table of squares method. For those unfamiliar with this approach, here is a quick overview:
We want to calculate A * B. First, we define a function, f, which is defined as f(x) = x * x / 4
Now behold! f(A + B) - f(A - B) = A * B
So, we can precalculate f(x) for the range we will required and store this data in a lookup table (LUT).
For two unsigned 8-bit numbers, the range of values we need to know the value of f(x) for is limited to -255 (0 - 255) to 510 (255 + 255). Since f(510) = 65025, we will need 2 bytes for each entry in the LUT, requiring 1024 bytes of LUT data. This may be excessive for some applications - it depends on whether you can afford the RAM.
Furthermore, this table will need to be aligned to a 256 byte boundary to achieve maximum speed.
So, first, I will give you the table:
EDIT: I also have some code to generate this table at runtime. If you would like this code, PM me. Note that the code doesn't handle the memory allocation - it just generates the table where you tell it to.
Code: Select all
; This macro allows you to align data to any byte boundary
#define ALIGN(x) .org (($ + (x - 1)) - (($ + (x - 1)) % x))
ALIGN(256)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; g_alUMulTable
;;
;; This table is used by various routines to perform multiplications using
;; the formula:
;;
;; f(a+b) - f(a-b) = a * b
;;
;; where:
;;
;; f(x) = x * x / 4
;;
;; This table contains f(x) for all numbers in the range [0..511] inclusive
;; in 4 256-byte chunks in the following (strange, but optimal) order:
;; 0 - 255 low byte of f(256..511)
;; 256 - 511 high byte of f(256..511)
;; 512 - 767 low byte of f(0..255)
;; 768 - 1023 high byte of f(0..255)
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
g_alUMulTable .equ $ >> 8
.db $00,$80,$01,$82,$04,$86,$09,$8c,$10,$94,$19,$9e,$24,$aa,$31,$b8
.db $40,$c8,$51,$da,$64,$ee,$79,$04,$90,$1c,$a9,$36,$c4,$52,$e1,$70
.db $00,$90,$21,$b2,$44,$d6,$69,$fc,$90,$24,$b9,$4e,$e4,$7a,$11,$a8
.db $40,$d8,$71,$0a,$a4,$3e,$d9,$74,$10,$ac,$49,$e6,$84,$22,$c1,$60
.db $00,$a0,$41,$e2,$84,$26,$c9,$6c,$10,$b4,$59,$fe,$a4,$4a,$f1,$98
.db $40,$e8,$91,$3a,$e4,$8e,$39,$e4,$90,$3c,$e9,$96,$44,$f2,$a1,$50
.db $00,$b0,$61,$12,$c4,$76,$29,$dc,$90,$44,$f9,$ae,$64,$1a,$d1,$88
.db $40,$f8,$b1,$6a,$24,$de,$99,$54,$10,$cc,$89,$46,$04,$c2,$81,$40
.db $00,$c0,$81,$42,$04,$c6,$89,$4c,$10,$d4,$99,$5e,$24,$ea,$b1,$78
.db $40,$08,$d1,$9a,$64,$2e,$f9,$c4,$90,$5c,$29,$f6,$c4,$92,$61,$30
.db $00,$d0,$a1,$72,$44,$16,$e9,$bc,$90,$64,$39,$0e,$e4,$ba,$91,$68
.db $40,$18,$f1,$ca,$a4,$7e,$59,$34,$10,$ec,$c9,$a6,$84,$62,$41,$20
.db $00,$e0,$c1,$a2,$84,$66,$49,$2c,$10,$f4,$d9,$be,$a4,$8a,$71,$58
.db $40,$28,$11,$fa,$e4,$ce,$b9,$a4,$90,$7c,$69,$56,$44,$32,$21,$10
.db $00,$f0,$e1,$d2,$c4,$b6,$a9,$9c,$90,$84,$79,$6e,$64,$5a,$51,$48
.db $40,$38,$31,$2a,$24,$1e,$19,$14,$10,$0c,$09,$06,$04,$02,$01,$00
.db $40,$40,$41,$41,$42,$42,$43,$43,$44,$44,$45,$45,$46,$46,$47,$47
.db $48,$48,$49,$49,$4a,$4a,$4b,$4c,$4c,$4d,$4d,$4e,$4e,$4f,$4f,$50
.db $51,$51,$52,$52,$53,$53,$54,$54,$55,$56,$56,$57,$57,$58,$59,$59
.db $5a,$5a,$5b,$5c,$5c,$5d,$5d,$5e,$5f,$5f,$60,$60,$61,$62,$62,$63
.db $64,$64,$65,$65,$66,$67,$67,$68,$69,$69,$6a,$6a,$6b,$6c,$6c,$6d
.db $6e,$6e,$6f,$70,$70,$71,$72,$72,$73,$74,$74,$75,$76,$76,$77,$78
.db $79,$79,$7a,$7b,$7b,$7c,$7d,$7d,$7e,$7f,$7f,$80,$81,$82,$82,$83
.db $84,$84,$85,$86,$87,$87,$88,$89,$8a,$8a,$8b,$8c,$8d,$8d,$8e,$8f
.db $90,$90,$91,$92,$93,$93,$94,$95,$96,$96,$97,$98,$99,$99,$9a,$9b
.db $9c,$9d,$9d,$9e,$9f,$a0,$a0,$a1,$a2,$a3,$a4,$a4,$a5,$a6,$a7,$a8
.db $a9,$a9,$aa,$ab,$ac,$ad,$ad,$ae,$af,$b0,$b1,$b2,$b2,$b3,$b4,$b5
.db $b6,$b7,$b7,$b8,$b9,$ba,$bb,$bc,$bd,$bd,$be,$bf,$c0,$c1,$c2,$c3
.db $c4,$c4,$c5,$c6,$c7,$c8,$c9,$ca,$cb,$cb,$cc,$cd,$ce,$cf,$d0,$d1
.db $d2,$d3,$d4,$d4,$d5,$d6,$d7,$d8,$d9,$da,$db,$dc,$dd,$de,$df,$e0
.db $e1,$e1,$e2,$e3,$e4,$e5,$e6,$e7,$e8,$e9,$ea,$eb,$ec,$ed,$ee,$ef
.db $f0,$f1,$f2,$f3,$f4,$f5,$f6,$f7,$f8,$f9,$fa,$fb,$fc,$fd,$fe,$ff
.db $00,$00,$01,$02,$04,$06,$09,$0c,$10,$14,$19,$1e,$24,$2a,$31,$38
.db $40,$48,$51,$5a,$64,$6e,$79,$84,$90,$9c,$a9,$b6,$c4,$d2,$e1,$f0
.db $00,$10,$21,$32,$44,$56,$69,$7c,$90,$a4,$b9,$ce,$e4,$fa,$11,$28
.db $40,$58,$71,$8a,$a4,$be,$d9,$f4,$10,$2c,$49,$66,$84,$a2,$c1,$e0
.db $00,$20,$41,$62,$84,$a6,$c9,$ec,$10,$34,$59,$7e,$a4,$ca,$f1,$18
.db $40,$68,$91,$ba,$e4,$0e,$39,$64,$90,$bc,$e9,$16,$44,$72,$a1,$d0
.db $00,$30,$61,$92,$c4,$f6,$29,$5c,$90,$c4,$f9,$2e,$64,$9a,$d1,$08
.db $40,$78,$b1,$ea,$24,$5e,$99,$d4,$10,$4c,$89,$c6,$04,$42,$81,$c0
.db $00,$40,$81,$c2,$04,$46,$89,$cc,$10,$54,$99,$de,$24,$6a,$b1,$f8
.db $40,$88,$d1,$1a,$64,$ae,$f9,$44,$90,$dc,$29,$76,$c4,$12,$61,$b0
.db $00,$50,$a1,$f2,$44,$96,$e9,$3c,$90,$e4,$39,$8e,$e4,$3a,$91,$e8
.db $40,$98,$f1,$4a,$a4,$fe,$59,$b4,$10,$6c,$c9,$26,$84,$e2,$41,$a0
.db $00,$60,$c1,$22,$84,$e6,$49,$ac,$10,$74,$d9,$3e,$a4,$0a,$71,$d8
.db $40,$a8,$11,$7a,$e4,$4e,$b9,$24,$90,$fc,$69,$d6,$44,$b2,$21,$90
.db $00,$70,$e1,$52,$c4,$36,$a9,$1c,$90,$04,$79,$ee,$64,$da,$51,$c8
.db $40,$b8,$31,$aa,$24,$9e,$19,$94,$10,$8c,$09,$86,$04,$82,$01,$80
.db $00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00
.db $00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$00
.db $01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$02,$02
.db $02,$02,$02,$02,$02,$02,$02,$02,$03,$03,$03,$03,$03,$03,$03,$03
.db $04,$04,$04,$04,$04,$04,$04,$04,$05,$05,$05,$05,$05,$05,$05,$06
.db $06,$06,$06,$06,$06,$07,$07,$07,$07,$07,$07,$08,$08,$08,$08,$08
.db $09,$09,$09,$09,$09,$09,$0a,$0a,$0a,$0a,$0a,$0b,$0b,$0b,$0b,$0c
.db $0c,$0c,$0c,$0c,$0d,$0d,$0d,$0d,$0e,$0e,$0e,$0e,$0f,$0f,$0f,$0f
.db $10,$10,$10,$10,$11,$11,$11,$11,$12,$12,$12,$12,$13,$13,$13,$13
.db $14,$14,$14,$15,$15,$15,$15,$16,$16,$16,$17,$17,$17,$18,$18,$18
.db $19,$19,$19,$19,$1a,$1a,$1a,$1b,$1b,$1b,$1c,$1c,$1c,$1d,$1d,$1d
.db $1e,$1e,$1e,$1f,$1f,$1f,$20,$20,$21,$21,$21,$22,$22,$22,$23,$23
.db $24,$24,$24,$25,$25,$25,$26,$26,$27,$27,$27,$28,$28,$29,$29,$29
.db $2a,$2a,$2b,$2b,$2b,$2c,$2c,$2d,$2d,$2d,$2e,$2e,$2f,$2f,$30,$30
.db $31,$31,$31,$32,$32,$33,$33,$34,$34,$35,$35,$35,$36,$36,$37,$37
.db $38,$38,$39,$39,$3a,$3a,$3b,$3b,$3c,$3c,$3d,$3d,$3e,$3e,$3f,$3f
Code: Select all
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alUMulBC:
;;
;; Unsigned multiply of B and C, storing the 16-bit result in BC
;;
;; INPUTS:
;; B - Multiplicand
;; C - Multiplier
;;
;; OUTPUTS:
;; BC - B * C
;;
;; DESTROYED:
;; AF, HL
;;
;; TIMINGS (not including CALL):
;; 126 T-States
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alUMulBC:
ld a, b ; [4] calculate B + C.
add a, c ; [4]
ld l, a ; [4] if carry flag is set we use
sbc a, a ; [4] g_alUMulTable otherwise we use
sbc a, 254 - g_alUMulTable ; [7] g_alUMulTable + 2.
ld h, a ; [4]
ld a, b ; [4] calculate B - C.
sub c ; [4]
jp nc, $ + $0005 ; [10] negate result if < 0
neg ; [8]
ld c, (hl) ; [7] retrieve g_alUMulTable[A+B].
inc h ; [4]
ld b, (hl) ; [7]
ld l, a ; [4] Select next table
ld h, g_alUMulTable + 2 ; [7]
ld a, c ; [4] subtract Table[B-C]
sub (hl) ; [7] from Table[B+C]
ld c, a ; [4]
ld a, b ; [4]
inc h ; [4]
sbc a, (hl) ; [7]
ld b, a ; [4]
ret ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alUMulBC
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Last edited by qarnos on Sat 26 Aug, 2006 10:11 am, edited 2 times in total.
Fast 8-bit multiplication: signed x signed
I am not entirely happy with this code, but this will use the above lookup table to perform signed * signed multiplication.
Code: Select all
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alSMulBC:
;;
;; Signed multiply of B and C, storing the 16-bit result in BC
;;
;; INPUTS:
;; B - Multiplicand (-128..127)
;; C - Multiplier (-128..127)
;;
;; OUTPUTS:
;; BC - B * C
;;
;; DESTROYED:
;; AF, HL
;;
;; TIMINGS (not including CALL):
;; 173 T-states
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alSMulBC:
ld a, b ; [4] calculate B + C.
add a, c ; [4]
ld l, a ; [4]
sbc a, a ; [4] A = 255 if CA, 0 if NC
xor b ; [4]
xor c ; [4] if B and C have equal signs
add a, a ; [4] we negate if B + C carries.
sbc a, a ; [4] if B and C have opposing signs we
add a, l ; [4]
ld l, a ; [4]
sbc a, a ; [4]
xor l ; [4] negate if B + C did not carry.
ld l, a ; [4]
ld a, b ; [4] calculate B - C.
sub c ; [4]
ld h, a ; [4]
sbc a, a ; [4] A = 255 if CA, 0 if NC
xor b ; [4]
xor c ; [4] if B and C have equal signs
add a, a ; [4] we negate if B + C carries.
sbc a, a ; [4] if B and C have opposing signs we
add a, h ; [4]
ld h, a ; [4]
sbc a, a ; [4]
xor h ; [4] negate id B + C did not carry.
ld h, g_alUMulTable + 3 ; [7]
ld b, (hl) ; [7] retrieve g_alUMulTable[A+B].
dec h ; [4]
ld c, (hl) ; [7]
ld l, a ; [4] Select next table
ld a, c ; [4] subtract Table[B-C]
sub (hl) ; [7] from Table[B+C]
ld c, a ; [4]
ld a, b ; [4]
inc h ; [4]
sbc a, (hl) ; [7]
ld b, a ; [4]
ret ; [10]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alSMulBC
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Fast 8-bit multiplication: signed x unsigned
Finally, signed * unsigned multiplication.
This routine, in my experience, is mainly useful for mutliplications dealing with fixed point multiplications. For instance, if you want to scale a signed number by a certain amount, you can use this routine and then shift the result as required.
This routine is also dependant on the unsigned rountine - it will JP to it if the signed argument is positive.
EDIT: I should also point out - with these 3 multiplication routines, it is possible to construct any kind of x-bit by x-bit multiplication by following the standard long-multiplication approach. For instance, multiplying two 16-bit values will required 4 multiplications and a few sign extensions and additions (this can be reduced to 3 multiplications if you don't mind losing a little bit of accuracy).
This routine, in my experience, is mainly useful for mutliplications dealing with fixed point multiplications. For instance, if you want to scale a signed number by a certain amount, you can use this routine and then shift the result as required.
This routine is also dependant on the unsigned rountine - it will JP to it if the signed argument is positive.
EDIT: I should also point out - with these 3 multiplication routines, it is possible to construct any kind of x-bit by x-bit multiplication by following the standard long-multiplication approach. For instance, multiplying two 16-bit values will required 4 multiplications and a few sign extensions and additions (this can be reduced to 3 multiplications if you don't mind losing a little bit of accuracy).
Code: Select all
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alFMulBC:
;;
;; Fixed-point multiply of B and C, storing the 16-bit result in BC
;;
;; INPUTS:
;; B - Multiplicand (-128..127)
;; C - Multiplier (0..255)
;;
;; OUTPUTS:
;; BC - B * C
;;
;; DESTROYED:
;; AF, HL
;;
;; TIMINGS (not including CALL):
;; 144 T-states (B >= 0)
;; 145 T-states (B < 0)
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alFMulBC:
ld a, b ; [4] compute A = abs(B).
add a, a ; [4]
jp nc, alUMulBC ; [132/10] jump if B is positive
ld a, b ; [4] calculate B + C.
add a, c ; [4]
jp c, $ + $0005 ; [10] jump if A + B is positive
neg ; [8]
ld h, g_alUMulTable + 2 ; [4]
ld l, a ; [4]
ld a, c ; [4] calculate C - B.
ld c, (hl) ; [7]
inc h ; [4]
sub b ; [4]
ld b, (hl) ; [7]
ccf ; [4]
ld l, a ; [4] if carry flag is set we use
sbc a, a ; [4] g_alUMulTable + 2 otherwise we
sbc a, 254 - g_alUMulTable ; [7] use g_alUMulTable.
ld h, a ; [4]
ld a, c ; [4]
sub (hl) ; [7]
ld c, a ; [4]
inc h ; [4]
ld a, b ; [4]
sbc a, (hl) ; [7]
ld b, a ; [4]
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alFMulBC
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Line drawing.
Here is a my line drawing routine.
It makes use of a couple of the undocumented load ixh/ixl instructions, so make sure your assembler supports them (some assemblers might name them XH and XL instead of IXH and IXL).
This routine will only draw black lines (turning pixels on). It does this by executing an unrolled loop of SET instructions, so the plotting is very fast. This code also requires the FractionToFix8 routine I posted earlier (and, optionally, the debugging library).
I have benchmarked this code against several other line drawers and found it is:
11 times faster than TI-OS ILine (no prizes there, though!)
5.5 times faster than MirageOS fastlineb
2 times faster than this one I found on ticalc.org.
Obviously, it lacks the flexibility of the TI-OS and MirageOS routines, although it does allow you to specify which buffer to draw on, but if you want a balls-to-the-wall line drawer and dont mind the size (around 250 bytes), I would think this routine is pretty close to optimal.
It makes use of a couple of the undocumented load ixh/ixl instructions, so make sure your assembler supports them (some assemblers might name them XH and XL instead of IXH and IXL).
This routine will only draw black lines (turning pixels on). It does this by executing an unrolled loop of SET instructions, so the plotting is very fast. This code also requires the FractionToFix8 routine I posted earlier (and, optionally, the debugging library).
I have benchmarked this code against several other line drawers and found it is:
11 times faster than TI-OS ILine (no prizes there, though!)
5.5 times faster than MirageOS fastlineb
2 times faster than this one I found on ticalc.org.
Obviously, it lacks the flexibility of the TI-OS and MirageOS routines, although it does allow you to specify which buffer to draw on, but if you want a balls-to-the-wall line drawer and dont mind the size (around 250 bytes), I would think this routine is pretty close to optimal.
Code: Select all
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; alDrawLineB:
;;
;; Draws a line on the specified screen buffer from (D, E) to (H, L).
;; All pixels along the line will be turned on.
;;
;; INPUTS:
;; BC - Address of screen buffer to draw to.
;; DE - x1, y1
;; HL - x2, y2
;;
;; DESTROYED:
;; AF, BC, DE, HL, IX
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
alDrawLineB:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; If DEBUG is defined, check to see that the co-ordinates are
;; valid and ASSERT if they are not.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#ifdef DEBUG
ld a, 95
cp d
ASSERT(nc)
cp h
ASSERT(nc)
ld a, 63
cp e
ASSERT(nc)
cp l
ASSERT(nc)
#endif
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Compare x1 and x2 and swap if neccessary so that we are always
;; drawing left to right.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
ld a, h ; [4] first do x2 - x1
sub d ; [4]
jr nc, $ + $0005 ; [12/7] if x1 > x2 then swap the co-ords
ex de, hl ; [4] and negate the accumulator so we
neg ; [8] end up with the absolute line width.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Calculate absolute height. The real height, which could have a
;; negative value, is pushed to the stack, mainly so we can test
;; the sign later on.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
ld h, a ; [4] replace x2 with line width.
ld a, e ; [4]
sub l ; [4] y1 - y2
push af ; [11] save raw line height.
jr nc, $ + $0004 ; [7/12]
neg ; [8]
ld l, a ; [4] replace y2 with line height.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Calculate the y1 offset (y1 * 12) and add to SCREENRAM address.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
push hl ; [11] save width and height.
ld a, e ; [4]
add a, a ; [4] y1 * 2
add a, e ; [4] y1 * 3
ld h, 0 ; [7]
ld l, a ; [4]
add hl, hl ; [11] y1 * 6
add hl, hl ; [11] y1 * 12
add hl, bc ; [11] y1 * 12 + SCREENRAM
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Now calulcate and add the x1 byte offset.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
ld a, d ; [4]
and $f8 ; [7] mask out low 3 bits so we can
rra ; [4] RRA 3 times with no chance of
rra ; [4] setting the carry flag.
rra ; [4]
ld b, 0 ; [7] and now set BC to the x pixel
ld c, a ; [4] offset.
add hl, bc ; [11]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Pop the width and height and determine which way we will draw
;; the line (ie: using _nhline or _nvline).
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
pop bc ; [10] restore width/height
ld a, b ; [4] cp width - height
cp c ; [4]
jr c, _setupNV ; [12/7]
ld a, c ; [4]
ld ix, _nhline ; [14]
jp _plotLine ; [10]
_setupNV: ld a, b ; [4]
ld b, c ; [4]
ld ix, _nvline ; [14]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Call FractionToFix8 to obtain the slope of the line.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
_plotLine: ld c, $01 ; [4]
call FractionToFix8 ; [290/498]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; The two plotting subroutines are each made out of 8 segments of
;; 9 bytes of code which set the bits of the line. To determine
;; the initial entry point we multiply the bit offset by 9 and add
;; that to the base address for the routine we will execute.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
ld a, d ; [4]
and $07 ; [7]
ld e, a ; [4]
add a, a ; [4] *= 2
add a, a ; [4] *= 4
add a, a ; [4] *= 8
add a, e ; [4] *= 9
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Now add the calculated offset to the address in IX
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
add a, ixl ; [8]
ld ixl, a ; [8]
ld a, ixh ; [8]
adc a, 0 ; [7]
ld ixh, a ; [8]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Now check the sign bit we pushed earlier. This will determine
;; if the line is being drawn top-to-bottom or vice-versa. From
;; here the code will branch to the entry point of either _nvline
;; or _nhline.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
pop af ; [10]]
ld a, $80 ; [7]
inc b ; [4]
jr nc, $ + $0007 ; [12/7]
ld de, 12 ; [10]
jp (ix) ; [8]
ld de, -12 ; [10]
jp (ix) ; [8]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; This is an unrolled loop which draws a line where (w <= h)
;; (ie: the line is more verticle than horizontal).
;;
;; INPUTS:
;; HL - Address of first byte of line in screen RAM.
;; A - Gradient accumulator.
;; B - Line counter
;; C - X gradient
;; DE - Line skip offset (+/-12)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
inc hl ; [6] (1)
_nvline: set 7, (hl) ; [15] (2) set pixel bit.
add hl, de ; [11] (1) advance to next line.
add a, c ; [4] (1) add gradient.
jr c, $ + $000b ; [7/12] (2) jump to next bit on carry.
djnz $ - $0006 ; [13/8] (2) loop if more bits to set.
ret ; [10] (1)
set 6, (hl) ; [15] (2)
add hl, de ; [11] (1)
add a, c ; [4] (1)
jr c, $ + $000b ; [7/12] (2)
djnz $ - $0006 ; [13/8] (2)
ret ; [10] (1)
set 5, (hl) ; [15] (2)
add hl, de ; [11] (1)
add a, c ; [4] (1)
jr c, $ + $000b ; [7/12] (2)
djnz $ - $0006 ; [13/8] (2)
ret ; [10] (1)
set 4, (hl) ; [15] (2)
add hl, de ; [11] (1)
add a, c ; [4] (1)
jr c, $ + $000b ; [7/12] (2)
djnz $ - $0006 ; [13/8] (2)
ret ; [10] (1)
set 3, (hl) ; [15] (2)
add hl, de ; [11] (1)
add a, c ; [4] (1)
jr c, $ + $000b ; [7/12] (2)
djnz $ - $0006 ; [13/8] (2)
ret ; [10] (1)
set 2, (hl) ; [15] (2)
add hl, de ; [11] (1)
add a, c ; [4] (1)
jr c, $ + $000b ; [7/12] (2)
djnz $ - $0006 ; [13/8] (2)
ret ; [10] (1)
set 1, (hl) ; [15] (2)
add hl, de ; [11] (1)
add a, c ; [4] (1)
jr c, $ + $000b ; [7/12] (2)
djnz $ - $0006 ; [13/8] (2)
ret ; [10] (1)
set 0, (hl) ; [15] (2)
add hl, de ; [11] (1)
add a, c ; [4] (1)
jr c, $ + $0005 ; [7/12] (2)
djnz $ - $0006 ; [13/8] (2)
ret ; [10] (1)
djnz _nvline - 1 ; [13/8] (2)
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; This is an unrolled loop which draws a line where (w > h)
;; (ie: the line is more horizontal than verticle).
;;
;; INPUTS:
;; HL - Address of first byte of line in screen RAM.
;; A - Gradient accumulator.
;; B - Column counter
;; C - Y gradient
;; DE - Line skip offset (+/-12)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
inc hl ; [6] (1)
_nhline: set 7, (hl) ; [15] (2) set pixel bit.
add a, c ; [4] (1) add delta.
jr nc, $ + $0003 ; [12/7] (2) jump on gradient overflow.
add hl, de ; [11] (1) advance to next line.
djnz $ + $0003 ; [13/8] (2) loop.
ret ; [10] (1)
set 6, (hl) ; [15] (2)
add a, c ; [4] (1)
jr nc, $ + $0003 ; [12/7] (2)
add hl, de ; [11] (1)
djnz $ + $0003 ; [13/8] (2)
ret ; [10] (1)
set 5, (hl) ; [15] (2)
add a, c ; [4] (1)
jr nc, $ + $0003 ; [12/7] (2)
add hl, de ; [11] (1)
djnz $ + $0003 ; [13/8] (2)
ret ; [10] (1)
set 4, (hl) ; [15] (2)
add a, c ; [4] (1)
jr nc, $ + $0003 ; [12/7] (2)
add hl, de ; [11] (1)
djnz $ + $0003 ; [13/8] (2)
ret ; [10] (1)
set 3, (hl) ; [15] (2)
add a, c ; [4] (1)
jr nc, $ + $0003 ; [12/7] (2)
add hl, de ; [11] (1)
djnz $ + $0003 ; [13/8] (2)
ret ; [10] (1)
set 2, (hl) ; [15] (2)
add a, c ; [4] (1)
jr nc, $ + $0003 ; [12/7] (2)
add hl, de ; [11] (1)
djnz $ + $0003 ; [13/8] (2)
ret ; [10] (1)
set 1, (hl) ; [15] (2)
add a, c ; [4] (1)
jr nc, $ + $0003 ; [12/7] (2)
add hl, de ; [11] (1)
djnz $ + $0003 ; [13/8] (2)
ret ; [10] (1)
set 0, (hl) ; [15] (2)
add a, c ; [4] (1)
jr nc, $ + $0003 ; [12/7] (2)
add hl, de ; [11] (1)
djnz _nhline - 1 ; [13/8] (2)
ret ; [10] (1)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; End of alDrawLineB
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-
- MCF Legend
- Posts: 1601
- Joined: Mon 20 Dec, 2004 8:45 am
- Location: Budapest, Absurdistan
- Contact:
Re: Line drawing.
You should definitely compare its speed with Jim e's:qarnos wrote:I have benchmarked this code against several other line drawers and found it is:
11 times faster than TI-OS ILine (no prizes there, though!)
5.5 times faster than MirageOS fastlineb
2 times faster than this one I found on ticalc.org.
https://www.ticalc.org/archives/files/f ... 34642.html
And here's mine, which has a decent speed (about 1.5 times faster than Badja's) and an even more decent size:
http://www.ticalc.org/archives/files/fi ... 23429.html
Re: Line drawing.
Thanks for this. I had heard about Jim E's routine but had never been able to find it!CoBB wrote:You should definitely compare its speed with Jim e's:qarnos wrote:I have benchmarked this code against several other line drawers and found it is:
11 times faster than TI-OS ILine (no prizes there, though!)
5.5 times faster than MirageOS fastlineb
2 times faster than this one I found on ticalc.org.
https://www.ticalc.org/archives/files/f ... 34642.html
And here's mine, which has a decent speed (about 1.5 times faster than Badja's) and an even more decent size:
http://www.ticalc.org/archives/files/fi ... 23429.html
I will check it out.
UPDATE:
I benchmarked the interrupt-safe version of Jim E's routine and found, in my tests at least, that my code was 1.06 times faster - a negligable difference.
I was unable to test the interrupts disabled routine due to the fact that my benchmarking code needs the interrupts for timing. For the same reason, I was unable to test your code.
EDIT (AGAIN)
I thought I should publish my benchmarking method, just in case it is flawed or if someone wants try to reproduce my results.
Basically, I read the entire 64kb of the Z80-visible RAM and interpret each 4 byte group as a set of line end-points:
Code: Select all
_lineLoop: ld a, (bc)
inc bc
and 95
ld d, a
ld a, (bc)
inc bc
and 63
ld e, a
ld a, (bc)
inc bc
and 95
ld h, a
ld a, (bc)
inc bc
and 63
ld l, a
push bc
ld bc, PLOTSSCREEN
call alDrawLineB
pop bc
ld a, b
or c
jr nz, _lineLoop
It yeilds consistant results, so I am satisfied.
Last edited by qarnos on Sun 27 Aug, 2006 11:59 am, edited 2 times in total.
- Jim e
- Calc King
- Posts: 2457
- Joined: Sun 26 Dec, 2004 5:27 am
- Location: SXIOPO = Infinite lives for both players
- Contact:
I'd say it be faster than mine on long lines. Mine uses Bresenham line algorithm so its has to do some compares. Other than that the code looks strikingly similar. Actually mine started off using slope but I thought the over head from the div routine would take to long.
Edit: Couldn't that Fraction to fix 8 routine be optimised like this.
Edit: Couldn't that Fraction to fix 8 routine be optimised like this.
Code: Select all
FractionToFix8: .MODULE FractionToFix8
add a, a ; [4]
jr c, _overFlow ; [12/7]
cp b ; [4]
jr nc, _overFlow ; [12/7]
sla c ; [8]
jp nc, FractionToFix8 ; [10]
ret ; [10]
_overFlow: sub b ; [4]
sll c ; [8]
jp nc, FractionToFix8 ; [10]
ret ; [10]