Building a performance static analyser

figaro · June 16, 2010, 2:46pm

I was wondering if someone could let me know if tools exist that check for performance degrading coding practices. There is of course the well known Valgrind (Valgrind Home), but the question is more if it is possible at a fundamental level. Do generic test cases exist for checking statically for coding practices that hamper performance for when the program is run?

Corona688 · June 16, 2010, 3:08pm

Unless you mean you have a list of very specific bad practices for it to check for, it is difficult for an algorithm to deduce the intent of a program well enough to see whether it's a good or poor solution to the problem, except at very small scopes.

To a certain degree that's what an optimizing compiler does; taking a close look at what's been optimized where might be illuminating if your compiler can be convinced to be sufficiently spammy.

Tracing your program's system and library calls would also be a place to start. What does the program spend most of its time doing? Does it really need to be waiting when it is? Is it doing I/O efficiently? Is it polling when it could be using a blocking call instead? Is it doing inexplicable repetitive things? Is it calling malloc way more often than a sane program would need to? etc. etc.

figaro · June 16, 2010, 3:29pm

Thank you for your answer. Some more guidelines to consider:
1- Static code analysers already check code for bad practices. That is of course not to say that resolving those issues would lead to a better performing program.
2- Recursion is usually a good place to start looking for performance issues.
3- The creation of very large arrays is usually a sign of poor performance, because most of the time only a small portion is needed.
4- Repeated use of input validation. You can never be truly sure the data that the program is operating on is sufficiently sane, so there is a risk trade off between performance and data sanity.

Implementing performance measures always has the downside of micro-optimisation, so that the measures work in only one set up or on a specific data set.

Any more ideas are welcome.

jim_mcnamara · June 16, 2010, 3:31pm

Another approach is to use a profiler:
gcc example

gcc -p -p -o myprog myprog.c
./myprog [argument1..argument2]
grpof myprog

gprof produces all kinds of performance analysis information. This is not static, like you wanted for the reasons corona describes above - it is hard to tell what the intent of code is, but when code runs you can analyze it's effectiveness.

Corona688 · June 16, 2010, 3:44pm

The creation of large unused arrays is certainly a red flag for cargo cult programming (if I make this bigger, it stops crashing! All done.) but it doesn't seem to me much of a performance issue, unless it's significant enough to eat into swap.

figaro · June 16, 2010, 3:54pm

I hadn't heard of grpof before, but certainly worthwhile to put on the list.
Also, clang ("clang" C Language Family Frontend for LLVM) has a built in profiler.
Another issue I had given some thought is that I expect a static performance analyser to produce many false positives, or anyway alert on breaks that do not have a good alternative or actually meet good design practices from a functional standpoint. So such an analyser will unlikely be very robust on the input given.

anon64183241 · June 16, 2010, 4:06pm

In my experience, static analysis tools aren't going to be able to get you much if you want to avoid performance bottlenecks. The tools already mentioned (and I'll add memusage to that list, to look for memory leaks) do a great job of showing you where your real performance bottlenecks are, which makes sense, since performance is a matter of both the code (known) and the input (often unknown).

However, code review by other programmers, peer programming, and general sets of best practices for coding can be of great use here.

figaro · June 16, 2010, 4:58pm

Thank you for your answer. I understand correctly that memusage is a part of Glibc (ftp://ftp.gnu.org/gnu/glibc/)?
Indeed I am looking to formalise those best practices that you mention, if not different than best practices already in use for static code analysis.

anon64183241 · June 16, 2010, 5:45pm

Yes, memusage comes with glibc - or on my OS (CentOS) as part of the glibc-utils package.

As far as best practices go...if you're coding in C, there are a lot of "style" guides available at various sites, but often, it's more the logic of the programming that dictates how it's going to run, performance wise. Good optimizing compilers will do what they do, and more often than not, specific coding styles don't end up producing greatly different binaries. If you want to crank out the most efficient code possible, there are books about that, but it becomes complex, hardware dependent and difficult to manage FAST, along with requiring a lot of esoteric knowledge from the coders.

So, coding guidelines are more for other coders reading the code. Which is definitely worth looking at.

Performance-wise, it's much more about logic, like I said, and ends up being a trade off of a lot of factors - memory, disk I/O, processor time, etc. In some cases, a not terribly elegant loop (which will use a lot of processor) may be better than the alternative that spares the CPU but does a lot of i/o. And so forth.

figaro · June 17, 2010, 12:19am

Thank you for all your answers.