How to read a fast written log file at Real time speed?

Hello All,

I am building a real time parser for a log file in my application.
The log file is continuously written at a very fast pace and gets rolled over every 10 minutes.

I have measured the speed and observed that around 1000 lines are written to it every second, each line about 30-40 characters.

I have tried using tail -F, but it always lags behind the speed at which the file is being written.
Could you suggest anything else I can use to read the file line by line quickly at Realtime speed ?

Below is what I have right now:

tail -F --lines=10000000 --retry --max-unchanged-stats=10 "$logFile" | while IFS= read -r line || [ -n "$line" ]
do
 -- some logic --
done

Thank you.
-CaQ

That's not really an easy problem to solve. Any solution that meets your requirements has to take into account specifics of the OS, the file system, and even the physical hardware. Obviously such a solution isn't going to be very portable.

The first thing you have to figure out is what's fast enough? Because there's always going to be some delay in reading data from the file after it's been written.

You might be better served interposing something into the logging stream that splits the stream into two - one to the original logging system and one into your real-time parser.

1 Like

Thanks for your post.
About what is fast enough, a delay of 30 seconds or so would be within the accepted level. But I have seen that on average, the tail command is able to read less than 70% of the file(within 10 minutes) before the file gets rolled over and script throws the error (as the log files are present on a shared NFS mount):

tail: error reading `<file-path>': Stale NFS file handle

Reliable read-behind-write on NFS? That's not going to happen because NFS is stateless. And yes, what you're trying to do is called "read-behind-write":

https://www.google.com/search?q="read\+behind\+write"

If you have a requirement to monitor those logs, you need to monitor them another way. Reading them over NFS is not going to work.

This is never going to work right on NFS. NFS is also probably why tail couldn't keep up. You should have the application send logs to you. How to do this depends on the application.

NFSv4 which is increasing the norm on Linux systems is statefull. Earlier versions were stateless.

Hi there..

I managed to improve the performance by 10-fold for real time reading.

I analysed and found that most time was being taken by

while IFS= read -r line

I replaced the while loop with awk to read a single logical unit(each about 6000 lines) in one go.

tail -F --lines=10000000 --retry --max-unchanged-stats=10 "$FILE_NAME" | awk -f awk-script.txt

Now the speed of reading is able to more than keep up with speed of write.
Even when the file gets rolled over, it is able to catch-up with the new file and gives message like:
tail: `<file-name>' has been replaced; following end of new file

This is still not reliable, especially on NFS.