Efficient logging of time measurements

tamil.pamaran · February 6, 2013, 8:00am

Hi all,

I am working on a software, which was developed using C programming on unix environment. I have added certain functionality to the software.

Now, I interested in measuring time taken by new functionality (Mean value of huge amount of tests). Since new function will be part of many threads and will be running simultaneously.

I am looking for a efficient logging mechanism to log time duration for each execution, that will not reduce the performance.

I hope appending to a log file creates bottleneck.

FYI. This software is real time one and will be running for many weeks.

Thanks in advance,

Tamil

venam · February 6, 2013, 9:11am

First of all I'd like to state that I'm a not an expert in C.
I use in C++ the boost::timer library which is an excellent one to do those calculations.

If you don't want to add dependencies to your project you can do the following:
when the function that you want to calculate time starts start a thread that will count time : see man 3 usleep or man 3 sleep or man 3 nanosleep.
This thread sleeps a number of time which increase every n time unit.
This thread stops counting when the other function finish the procedure and change a boolean value to true.
I think the above solution will not be really efficient but it can work.

Experts in C should have a better answer to this question.

tamil.pamaran · February 6, 2013, 10:12am

Thanks Venam,

I feel creating a new thread for each thread in focus, would be a bad idea. I am thinking of using time() system call to measure time differences. There should not be any race conditions among the threads, in calling time().

However, once measurement is taken, there is a race condition when multiple threads trying to log. Using semaphore will in turn will create bottleneck.

I am not familiar with syslog(). Can that be useful here?

/Tamil

venam · February 6, 2013, 10:33am

I didn't get your point about the bottleneck.
In the world of threads there is something called mutex.
paulbridger.com/mutexes
You can also check timer_create() .

Maybe you can start only one thread that count the time and this thread checks for the status of all the other thread and saves in an array/struct the time they are done.

tamil.pamaran · February 6, 2013, 12:41pm

Why do we need a thread to count seconds, when there are systems functions like time() ?

Mutex is also known as semaphore, using which will make threads blocking, which is a bottleneck.

/Tamil

achenle · February 7, 2013, 7:28am

OK, for a real-time app, whatever you do has to be FAST, and if the resources aren't available for your processing thread to do the logging, I'm assuming you'll just have to skip logging that event.

Just as importantly, whatever logging you add, the performance has to be deterministic - it takes the same overhead to do the logging for a processing thread every single time you do the logging. That number should be as small as possible, but it can't be good most every time, with an occasional long-running hang waiting for resources.

So something like Boost is probably the last thing you should be looking into using. Do you know what's going on inside the Boose libraries? And how using it will impact your real-time processing in every single case? No, you don't. If you could get the entire Boost development team and have them answer, "Tell me exactly what happens when I use this Boost functionality in every possible use case.", even the Boost developers themselves almost certainly wouldn't be able to tell you.

Using syslog() might meet your needs, but I'd measure the overhead of doing so.

If I were to roll my own, I'd probably try something like this:

Define a binary structure:

struct timed_event
{
    int event_id;
    time_t event_start;
    time_t event_end;
    time_t current_time;
}

Define event ids for all the events you want to collect data from.
Create a pipe of some kind - named pipe, SYS V message queue, etc.
For each event you time, fill in one of those structures.
After filling in the structure, write the binary structure to the pipe. Write using a non-blocking low-level call such as a simple write(). Do NOT use C++ functionality, for a lot of reasons. Again, this needs to be non-blocking so if whatever pipe/queue you're writing to is full, you don't wait. It also needs to be atomic - which is why YOU need to make the write() call (or whatever) directly in YOUR code.
Write another process that reads the binary data from the pipe (should be easy since the reader knows the size of the binary structure - read() the proper number of bytes and get a filled-in struct), translates it into a readable log message, and logs it. Or maybe you just log the raw binary data and translate it offline later depending on your space needs.

You might want to add more data to the structure.

One thing to watch out for, especially if you implement something like a simple write() to a Unix pipe: partial write()'s. For example, if your structure comes out to be 32 bytes, maybe the pipe is almost full and write() only puts 24 bytes in the pipe. You might want do so something like start each structure with a magic number int field and put something in there that hopefully can't be duplicated in any other field. Then the reading/logging process can use that magic number field to try to sync up the binary stream of data.

Another issue: make sure your processing code and your logging code use the same size for the binary structure. Compiler optimization flags can change data alignments, which can change structure sizes. So if you aggressively optimize the processing code but not the logging code, the processing code could be using a binary structure that's 48 bytes because of alignment padding while the logging process could be using a non-padded structure that's 32 bytes.

venam · February 7, 2013, 8:12am

The idea of using a pipe is pretty cool. I never taught about that. Next time I do something I'll take that into consideration. It's going to be way faster But isn't it going to be a little harsh on I/O, I mean it depends on the interval the reader will loop in the pipe.
Anyway, you're answer is cool.

tamil.pamaran · February 7, 2013, 8:15am

achenle,

Thanks for quite detailed response.

I was thinking about a similar solution. Apart from getting logs in string format, syslogd process works in similar fashion. Which mandates lot of context switches between syslogd process and measured process.

I am thinking of creating a thread to do logging rather than a process, so as to reduce process context switch.

Any comments?

I decided to go with the binary structure.

Which one will be better to send measurements from measured thread to logging thread?
Shared memory with mutex or Non blocking messages(pipe,...)

/Tamil

venam · February 7, 2013, 8:21am

I think because you are using a binary struct it will write the data as a reserved block in the pipe and you won't need mutex because the data won't overlap.
I may be wrong about that.

tamil.pamaran · February 7, 2013, 8:46am

Venam,

Mutex should be used in the case of shared memory.

For instance, an external structure array can be updated by measured thread and read(remove) by logging thread ( with mutex to avoid race condition)

I agree, there is no need of mutex in case of message transmission.

/Tamil

expl · February 7, 2013, 11:24am

Well that would depend on what exact 'shared memory' model you are going to use and frequency of the logging code. Also you might want to consider sampling algorithm if the code runs at very high frequency on a production platform, that basically means that you only measure every Nth run and its much more suitable for logging for extended amount of time a piece of code that is very frequent.

Another suggestion:
Sometimes real time execution results can be not very meaningful for fast sections of code due to thread scheduler making them apear random on a busy system, you should probably instead log CPU time (CPU Time - The GNU C Library) as it has more meaning.

achenle · February 9, 2013, 5:32pm

Well, you'd probably wind up blocking on the mutex at times, while you should never block on the write.

Also, if you do your logging in a separate process, if there's a bug in it that causes a process crash (SEGV, etc), then you won't bring down your production process(s).

Compare that against the context switching needed for a separate process.

Without knowing your hardware and software, I can't really comment on which would be better for you and your processing.

tamil.pamaran · March 9, 2013, 5:04am

Thanks achenle and expl,

Sorry for the delayed reply.

Since the measurement is done for research prototype, its very controlled process. Hence I decided to go for mutex based solution. In a process, many threads are measured (thread logs time taken by them to an dynamic array) and a one thread removes elements from array and writes to a log file.

Since measurement tends to get the system performance, I decided to use time(NULL) to log the time. Any suggestions?

/T

achenle · March 9, 2013, 1:45pm

If accuracy down to just a second is good enough, that'll work. If you need or want more accuracy, you can use "clock_gettime()", assuming you're running on a Linux-based system. On Solaris or HPUX you could use "gethrtime()".

Those would give you timestamps down to nanoseconds, although they are probably not that accurate in reality.

jim_mcnamara · March 10, 2013, 9:49am

Instead of pipes -- if you're worried about resources or blocking, simply use a single thread that reads a message queue. Once a message queue (SYSV or POSIX) is set up there is a single syscall ( e.g., msgsnd, no blocking ) that is mt-safe and a mutex is not required.

The buffersize, unlike, a pipe, can be any size. The message queue buffer is kernel persistent, so if the entire set of threads/processes using the message queue die, the data is still available. Pipes are closed on process termination.