char constants vs. hard-coding

cleopard · August 20, 2008, 1:11pm

This might be a silly question, but I thought I'd ask anyway. If I'm writing in C, isn't it more efficient to, for instance, use constant character variable set to 'A' instead of hard-coding a character 'A'? Since it's only a single character instead of a string, it might not matter much.

shamrock · August 20, 2008, 1:26pm

You would still need to somehow initialize the const char variable...unless I'm not reading you correctly. Best if you explained by an example.

cleopard · August 20, 2008, 1:46pm

const char at_sign = '@';
char str[] = "fred@email.com;
char *ptr;
.
.
.

ptr = strrchr (str, '@');

vs.

ptr = strrchr (str, at_sign);

shamrock · August 20, 2008, 2:34pm

The statement below is better only due to its generality.

ptr = strrchr (str, at_sign);

Later on if you wanted to set the at_sign variable to '~' then the change has to made only at one place...saving you a lot of typing; but in the first form all instances of '@' need to be replaced by '~' in all the source files.

ptr = strrchr (str, '@');

Other than that there is nothing to say that one approach is better or more efficient than the other.

jim_mcnamara · August 20, 2008, 4:07pm

The only other point - char to int promotion. For example, when a char is passed to a function by value, it actually is promoted to an int behind the scenes. There are other instances during arithmetic and boolean operations when a char is promoted to an int, then truncated in the result. See the C99 standards document for more information.
Or see the prototypes in ctype.h for any of the character functions - you'll note the arguments are int.

This means that char may undergo type change behind the scenes - so I'm not sure what you are gaining - if anything - other than as shamrock points out, some generality.

Being careful with const values and writing functions as true 'black boxes' that return nothing but copies of arguments, changing no argument, give rise to idempotent functions. These are FAR easier for the compiler to optimize. Stdc lib example: strlen(). Takes a const char * argument returns size_t.

spirtle · August 21, 2008, 4:40am

Following on from shamrock,, I think it's a trade-off and depends on the context and intentions.
If the variable is likely to vary, then

ptr = strrchr (str, at_sign);

is "better" because it is more general, but means I have to go and look up what the variable is assigned to. So from a readability prespective,

ptr = strrchr (str, '@');

is "better" because it is clearly self-documenting and sufficient since, in the email address parsing example given, one wouldn't expect to change '@' to anything else.

There is always an element of subjectivity in these arguments, though. Take your pick

otheus · August 21, 2008, 6:52am

Whether it's more efficient depends on the compiler and architecture. If you wanted to find out which is which, you could generate code to execute each code 100000 times, and then see which is more efficient.

Unter Intel architecture, hard-coding is more efficient because the compiler generates machine code in which the character @ is actually embedded into the instruction (implicit data). With the constant character, the compiler might also do this, or it might generate machine code using direct (absolute address) or indirect (relative to the stack) data addressing, both of which are less efficient. However, due to code caching and such, it's very unlikely to make any difference, unless this code is in many different places throughout the program.

Just for kicks, I wrote such programs and here are the run-times. When they finish, I'll post the run-times.

otheus · August 21, 2008, 7:08am

On an Intel PIII 800 under linux with gcc and no optimization, I get the following:

Running the program with the hard-coded character searching and printing 100000 times (this is 100000 distinct calls to these functions), I get an average of 0.038 seconds per run. Using the constant character, I get an average of 0.039 seconds per run. So hard coding is more efficient.

You can also use the pre-processor to achieve some level of generality without sacrificing performance. Instead of defining a constant, just do:

#define AT_SIGN '@'

...

ptr = strrchr( string, AT_SIGN );

redoubtable · August 21, 2008, 7:48pm

Despite what everyone said I think speed differences between both cases are not mensurable. For one, memory is stored in the data segment in both cases thus it's accessed in the same way/speed. Furthermore, this is highly platform/architecture/implementation dependent.

We should call a meta-programmer to enlighten us with accurate specifications on the matter at hand.

otheus · August 22, 2008, 4:23am

It IS measurable. But even after 1/2 million invocations, it made almost no difference on a very slow (10-year old) machine.

mostly wrong. The '@' literal is embedded in the machine instructions itself (for x86 architectures), so that's in the code segment. The "const" designation for a variable means the compiler can optimize that variable, for instance, by also "hard coding" the value inside instructions. However, I did not turn on optimizations. In my code, I defined the const char to be inside the main() call, meaning it would go on the stack. Do nothing is on the data segment. Finally, the call to strchr places both arguments on the stack. So the price of having a constant in an immediate instruction type is practically nullified by this.

Architecture and processor, yes. For instance, while practically all processors have both an 'immediate' addressing and a 'direct' addressing mode, the difference in the number of clock cycles to process such an argument varies across architectures (surely), processor manufacturers (AMD vs Intel), and processor families (Pentium vs Celeron). There almost always IS a difference, but in very-large-pipelined architectures and efficient caching, that difference is statistically erased.

However, there's more dependency on the compiler. Whether the compiler chooses immediate mode or direct mode for literals, whether it uses direct or stack-indexed addressing addressing for constants, weather it passes the first argument in using a register or the last, etc, etc. The OS can come into play, too, especially with my program of 100000 lines of code. This likely meant there were page-traps during the execution. For this reason (and others), I took an average of several runs.

WTF do you mean by a metaprogrammer??

redoubtable · August 22, 2008, 8:35am

I also did some testing and results were not conclusive that's why I said that. After ~0 calls to strrchr() in x86 with no forced preemption (kernel) and -20 nice level, 950 clock ticks happen in both cases (times()). I did the test 10 times, and results looked always the same.

I disagree, you're mixing things up. In elf32-i386 file format both cases store data in .rodata section (sections simplify my explanation). Although everything gets push()'d to the stack when a function is called, the places where data is push()'d FROM is the same. Let me demonstrate:

int
main ()
{
        write (1, "MOMMA", 5);
        return 0;
}
"MOMMA" string is stored in .rodata section, let's disassemble (objdump -D):
Disassembly of section .text:
...
080483b0 <main>:
...
 80483c1:       c7 44 24 08 05 00 00    movl   $0x5,0x8(%esp)
 80483c8:       00 
 80483c9:       c7 44 24 04 bc 84 04    movl   $0x80484bc,0x4(%esp)
 80483d0:       08 
 80483d1:       c7 04 24 01 00 00 00    movl   $0x1,(%esp)
 80483d8:       e8 13 ff ff ff          call   80482f0 <write@plt>
...
(items are push()'d in reverse order $0x5, $0x80484bc, $0x1)
Anyway, let's search for 0x80484bc :

Disassembly of section .rodata:
...
080484b8 <_IO_stdin_used>:
 80484b8:       01 00                   add    %eax,(%eax)
 80484ba:       02 00                   add    (%eax),%al
 80484bc:       4d                      dec    %ebp
 80484bd:       4f                      dec    %edi
 80484be:       4d                      dec    %ebp
 80484bf:       4d                      dec    %ebp
 80484c0:       41                      inc    %ecx
"\x4d\x4f\x4d\x4d\x41" = "MOMMA"

Thus demonstrating that despite "MOMMA" is used in main() (.text section), it is retrieved from .rodata section

Now, let's store "MOMMA" in a const var and call it from write():
# objdump -s test|grep -A 1 rodata
Contents of section .rodata:
 80484c4 03000000 01000200 4d4f4d4d 4100      ........MOMMA. 

So, the string is used in the .text section from .rodata section in both cases.

I meant compiler programmers eheh

I agree with everything else you said.