[C] swapping vars

Feel like posting Off Topic? Do it here.

Moderator: MaxCoderz Staff

coelurus
Calc Wizard
Posts: 585
Joined: Sun 19 Dec, 2004 9:02 pm
Location: Sweden
Contact:

Post by coelurus »

Programming on PCs is not about low-level milking performance anymore, it hasn't been for over a decade.

A teacher that says a compiler probably uses the stack to create local vars, ouch. I never listen to teachers in programming classes because they only tell you what they read in a book ten years ago.

I assembled the two functions in gcc without inlining or any opz at all. As expected, the regular swap function didn't use the stack and the code was half as long as the xor version. Thus the xor version was 50% slower using ten millions runs for timing. Busted already, but let's continue.

I tried optimizing with -O1 and what do you know, at this level of opz, gcc already figured out that both functions swapped two variables and therefore inserted some black magic, customized opcodes to do the swapping. In essence, gcc replaced both clever attempts at swapping with a set of instructions that you could only gain access to using inline assembler and a 1000 pages fat assembler opcode reference.

So, what does this mean? It means that any attempts at being clever with low level source code these days is utter stupidity, because gcc already knows tricks better than the ones you got. Use your brain where it's needed, and that's for high level optimizations.

"Drop it, xor swapping is stupid." - me.

That concluded, I would never, ever touch Python or any similar scripting languages with a ten foot imaginative stick, I will keep to good old C. Trust me, I know what I'm doing, I've impressed a string theory professor in the Nobel committee with my graphics programming skills already :D
User avatar
hop
Extreme Poster
Posts: 378
Joined: Sat 09 Dec, 2006 3:42 pm

Post by hop »

coelurus wrote:That concluded, I would never, ever touch Python or any similar scripting languages with a ten foot imaginative stick, I will keep to good old C. Trust me, I know what I'm doing, I've impressed a string theory professor in the Nobel committee with my graphics programming skills already :D
Scripting languages have their own uses that c simply does not apply to. If you know what you're doing you use the tool you can that is most suited for the job.
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

King Harold wrote:so, what are you saying now? what's your point?
That low-level optimisation (if it really is optimisation in the first place) does more harm than good, and it should come as the last resort, after making sure you have the best algorithm for the task and you have already profiled the code. There’s no point in trying to outwit the compiler.
King Harold wrote:as far as readability goes - if you can't read 3 xors then you shouldn't be programming C at all (or you're in the process of learning maybe)
Nice, but when you have to read through thousands of lines of obfuscated code, you suddenly realise the value of clarity. And frankly, I’ve been programming in C for many years already, but it would still take considerably more effort for me to figure out what the XOR trick is doing if I didn’t know it than the straightforward assignments. Now imagine the same with a previously unknown and considerably longer algorithm.
King Harold wrote:the guys at wikipedia obviously
Keep in mind that that’s a code example, not production code. In real life one would optimise a sorting routine. Examples should be clear and well structured.
hop wrote:Scripting languages have their own uses that c simply does not apply to. If you know what you're doing you use the tool you can that is most suited for the job.
In other words, if you have a lot of data to process, use C. ;)
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

King Harold wrote:as soon as you're optimizing, both ways will turn out optimized - they might even become the same!
Since they actually do come out the same, this has become a matter of readability. Personally, I do not find 3 xors 'obfuscated' or 'unreadable', well, swapping with an extra variable might look just a little more normal.
CoBB wrote:There’s no point in trying to outwit the compiler.
why not? compilers don't know everything do they? if you fail outwitting it then there's no harm done since it will optimize it the same way anyway, and if you actually succeed then your program should become faster
coelurus wrote:Programming on PCs is not about low-level milking performance anymore, it hasn't been for over a decade.
oh so that's why so many programs take so much resources.. Like, forexample, windows Vista, Azureus and FireFox 2.. I really wouldn't mind if the programmers would spend some extra time to 'low-level milk the performance' if it means their programs will become just a tiny little bit faster

if you want to swap 2 variables which are both on the stack in z80 asm, you could do this:

Code: Select all

pop hl
ex (sp),hl
push hl
I'm not saying it's the best way, but it saves 2 cc's compared to 2 pops and 2 pushes, and a register pair.
User avatar
Jim e
Calc King
Posts: 2457
Joined: Sun 26 Dec, 2004 5:27 am
Location: SXIOPO = Infinite lives for both players
Contact:

Post by Jim e »

King Harold wrote:
CoBB wrote:There’s no point in trying to outwit the compiler.
why not? compilers don't know everything do they? if you fail outwitting it then there's no harm done since it will optimize it the same way anyway, and if you actually succeed then your program should become faster
No, if the compiler doesn't get what you want then it will do exactly what you say. In other words it won't take any advantages. Just like the asm I posted.
King Harold wrote:
coelurus wrote:Programming on PCs is not about low-level milking performance anymore, it hasn't been for over a decade.
oh so that's why so many programs take so much resources.. Like, forexample, windows Vista, Azureus and FireFox 2.. I really wouldn't mind if the programmers would spend some extra time to 'low-level milk the performance' if it means their programs will become just a tiny little bit faster
Ummm...NONE of those programs are written in C. I think we can all guess why Azureus is resource hungry. It's kinda pointless to try to optimize at low level when you're writing in C# or java.

At any rate your low level milking is hurting the generated code. Optimization needs to occur at an algorithmic level, the compiler will handle the ultra low level assembly optimization.
King Harold wrote:if you want to swap 2 variables which are both on the stack in z80 asm, you could do this:

Code: Select all

pop hl
ex (sp),hl
push hl
I'm not saying it's the best way, but it saves 2 cc's compared to 2 pops and 2 pushes, and a register pair.
very good....and birds goes tweet.
Image
User avatar
hop
Extreme Poster
Posts: 378
Joined: Sat 09 Dec, 2006 3:42 pm

Post by hop »

Scripting languages have their own uses that c simply does not apply to. If you know what you're doing you use the tool you can that is most suited for the job.
In other words, if you have a lot of data to process, use C. ;)
Almost everything in computing is "a lot of data to process" but 9 out of 10 cases I'd rather use a scripting language for automisation of batch processing. If it also needs a powerfull user interface or low level data/memory structure manipulation, or if it shouldn't rely on preinstalled interpreters, then I'll knock on C's door.
Since they actually do come out the same, this has become a matter of readability.
Just put a comment in front of the lines saying "//swapping .. with ..".

Low level optimisation is for embedded systems and consoles. I bought a dual core 2ghz ddr2 pc6400 gf7600gt pc so you don't have to do that sort of thing any more on pc applications.
User avatar
tr1p1ea
Maxcoderz Staff
Posts: 4141
Joined: Thu 16 Dec, 2004 10:06 pm
Location: I cant seem to get out of this cryogenic chamber!
Contact:

Post by tr1p1ea »

I dont really understand the point of this thread ... is it to demonstrate that you know how xor works? The 'bad' code isnt bad at all, and any decent compiler will optimise it well.

Most people have their preferred language for certain tasks. I use C for lots of stuff, particularly tools to aid my calc projects :).
"My world is Black & White. But if I blink fast enough, I see it in Grayscale."
Image
Image
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

tr1p1ea wrote:I dont really understand the point of this thread
to start a discussion on how and what to optimize

most people seem to think you should optimize the algorithm only and leave the low-level optimizing to the compiler, which is not terrible, especially if you know how it is going to be optimized so you can make the algorithm in a way that will be optimized most.

But, (oh yes, here comes the but), you could probably squeeze just a little more performance out of the program if you could optimize the generated assembly by hand (after all, no compiler is ever perfect), you may think that it won't matter, but milliseconds do count! If any process is taking 100% CPU and it takes long then you are probably wishing it wouldn't take quite so long, if a program can save a millisecond every second then it saves 3 and a half seconds every hour, doesn't like much?
If you ever used Google Web Accelerator you know how it can add up on the long run, I saved 4,7 days at the time of writing, for 494098 pages.

btw, I know Azureus is written in Java, but that doesn't mean it can't be optimized.

I don't know what's up with Windows Vista though, very poor performance, what language has it been written in anyway?
User avatar
tr1p1ea
Maxcoderz Staff
Posts: 4141
Joined: Thu 16 Dec, 2004 10:06 pm
Location: I cant seem to get out of this cryogenic chamber!
Contact:

Post by tr1p1ea »

Of course optimising is important, but the example you showed that was so horrible that you almost fell off your chair doesnt require such a reaction since there isnt much wrong with it.

It isnt very nice to find someone elses code, attempt to improve it and then publically go on about how crappy theirs is and how awesome yours is. A better idea would have been to offer some optimisation ideas to the author of the original piece (in this case perhaps amend the wiki).
"My world is Black & White. But if I blink fast enough, I see it in Grayscale."
Image
Image
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

King Harold wrote:most people seem to think you should optimize the algorithm only and leave the low-level optimizing to the compiler, which is not terrible, especially if you know how it is going to be optimized so you can make the algorithm in a way that will be optimized most.
Optimise when you need it, that's the basic rule. And again, look at the algorithm first, and if performance is still inadequate, profile and optimise the bottlenecks.
King Harold wrote:But, (oh yes, here comes the but), you could probably squeeze just a little more performance out of the program if you could optimize the generated assembly by hand (after all, no compiler is ever perfect), you may think that it won't matter, but milliseconds do count!
With old CPUs this was indeed true. However, modern CPUs are extremely complex, and probably not even their designers can keep the full instruction set in their heads, let alone be aware of all the possible interactions between instructions (note: I only programmed Pentiums in assembly where you could execute certain pairs of instructions in parallel, but I've given up on following newer developments). Compilers have this knowledge all wired in by many clever people, so they are very likely to actually produce better code than you could by hand. Oh sure, you should be able to surpass it in theory, but you'd need improportionate effort compared to the gain. And then someone comes along and points out a high-level algorithmic weakness in your code, fixing it to run an order of magnitude faster without ever touching an assembler.
King Harold wrote:If any process is taking 100% CPU and it takes long then you are probably wishing it wouldn't take quite so long, if a program can save a millisecond every second then it saves 3 and a half seconds every hour, doesn't like much?
And how much extra time did it take to achieve this 0.1% improvement? Say, if there's a month of programmer time in it, you need more than 80 years of runtime to get it back. And we haven't even considered how much more expensive programmer time is compared to machine time.
King Harold wrote:If you ever used Google Web Accelerator you know how it can add up on the long run, I saved 4,7 days at the time of writing, for 494098 pages.
I don't think this is of any significance. We're talking about highly fragmented thinking time here. For instance, you can also save time by having a higher resolution screen, because you need to scroll around less and keep more stuff visible at a time. Such things are impossible to quantify in practice. The GWA counter is only good for making marketing claims.
User avatar
hop
Extreme Poster
Posts: 378
Joined: Sat 09 Dec, 2006 3:42 pm

Post by hop »

King Harold wrote:But, (oh yes, here comes the but), you could probably squeeze just a little more performance out of the program if you could optimize the generated assembly by hand (after all, no compiler is ever perfect), you may think that it won't matter, but milliseconds do count! If any process is taking 100% CPU and it takes long then you are probably wishing it wouldn't take quite so long, if a program can save a millisecond every second then it saves 3 and a half seconds every hour, doesn't like much?
If I have a process running at 100% that long I'd rather use multiple threads and save 50% or 75% of the time due to omgmulticores rather than waste more of my time working the code than I'd ever get back in saved execution time.

Save the low level optimisations for scientific applications or something like that.
I don't know what's up with Windows Vista though, very poor performance, what language has it been written in anyway?
Heavy performance != poor performance. It has to do a lot, so it makes sense you need a real PC to run it. And if you do then it runs just as smooth as XP or Ubuntu or Mac OS X. Performance is relative to the task that has to be performed, so if it really had poor performance you'd need the old Longhorn specs just to run it. It's only the drivers that perform poorly but that's not much of a problem if you got proper hardware with good drivers in the first place.

You don't say someone's 2 hour performance at a marathon sucks because you can run a mile in 6 minutes.
I don't think this is of any significance. We're talking about highly fragmented thinking time here. For instance, you can also save time by having a higher resolution screen, because you need to scroll around less and keep more stuff visible at a time. Such things are impossible to quantify in practice.
It depends. In the case of GWA you fill in the millisecond here and there with something else so you never save any time at all. But things like assigning keyboard combinations to mouse buttons or gestures in design programs like Power Designer or Paintshop saves seconds to minutes each time you use it.
Last edited by hop on Mon 07 May, 2007 1:52 pm, edited 1 time in total.
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

Cool, now you've said it in a way I can agree with you :P
And yea that counter is only good for marketing claims, but it makes me feel like I spend 5 days less in waiting for a page to load.
Say, if there's a month of programmer time in it, you need more than 80 years of runtime to get it back
Not if, forexample, you are Microsoft and half the world is going to use your program. It might only take a month with all those ppl's time combined. Azureus is also widely used, and it could have been far more optimized. It generally uses about 5 times as much CPU time as uTorrent.

and yes, the high level algorithm should be optimized first, but when you've done that....
User avatar
hop
Extreme Poster
Posts: 378
Joined: Sat 09 Dec, 2006 3:42 pm

Post by hop »

What the hell are you doing with Azureus if it generally takes even 5% cpu? Or are you saying 5x0%? Are you running a Pentium 1 or something?
Image
CoBB
MCF Legend
Posts: 1601
Joined: Mon 20 Dec, 2004 8:45 am
Location: Budapest, Absurdistan
Contact:

Post by CoBB »

King Harold wrote:Not if, forexample, you are Microsoft and half the world is going to use your program. It might only take a month with all those ppl's time combined.
Sure, but it's not those low-level bit grinding hacks that will make a difference in an OS. Those are two entirely different worlds. Abstractions cost much more in terms of performance than anything you can do at the bottom of the architecture, but there's really no other way to produce software at such a huge volume.
King Harold
Calc King
Posts: 1513
Joined: Sat 05 Aug, 2006 7:22 am

Post by King Harold »

hop wrote:If I have a process running at 100% that long I'd rather use multiple threads and save 50% or 75% of the time due to omgmulticores rather than waste more of my time working the code than I'd ever get back in saved execution time.
that does not save time.
hop wrote:What the hell are you doing with Azureus if it generally takes even 5% cpu? Or are you saying 5x0%? Are you running a Pentium 1 or something?
5% is 4% too much for a program that idles most of the time. It takes 10% here though, uTorrent about 1,5%, the difference is quite significant.

AMD 64 Athlon 3400+ @ 2.3 GHz, not pentium 1.
hop wrote:Heavy performance != poor performance. It has to do a lot, so it makes sense you need a real PC to run it. And if you do then it runs just as smooth as XP or Ubuntu or Mac OS X. Performance is relative to the task that has to be performed, so if it really had poor performance you'd need the old Longhorn specs just to run it. It's only the drivers that perform poorly but that's not much of a problem if you got proper hardware with good drivers in the first place.
Windows Vista doesn't actually do anything heavy except waste resources on purpose. When it isn't doing anything and supposed to be idle it keeps doing stuff and taking some CPU time that goes who-knows-where.

tr1pea wrote:It isnt very nice to find someone elses code, attempt to improve it and then publically go on about how crappy theirs is and how awesome yours is.
No but I'm not nice. I didn't say mine was awesome though.
Locked