Count lines and words of a stream output with tail

Kibou · October 16, 2013, 6:27am

Hello,

I need to tail -f a file output stream and I need to get only lines that contains "get" and "point" in the same line. It doesn't matter the order.

Then I need only the text BEFORE "point".

I have to count each line and perform other serveral actions after this has performed 3 times.

I have done something like this. But I don't know how to go on.
I am currently learning about AWK. I don't know if there's a straight forward way to do this with AWK. It would be nicer to avoid "grep".

tail -f file | grep --line-buffered "get.*point\|point.*get" | awk 'BEGIN {FS="point"} {print $1}'

The tail -f output would be something like this:

Hello world. This is my point.
How to get your point?
I would like to get a point. Do you?
I do not know how to get my point.

And after processing it, it should do something to get the lines with "get" and "point". It would see something like this:

How to get your point?
I would like to get a point. Do you?
I do not know how to get my point.

Next step is extracting everything before "point":

How to get your
I would like to get a 
I do not know how to get my

Count words in each line:

4
6
8

And quit when this is performed 3 times. So it will quit tail -f right now because it has perform it three times.

I am getting very frustrated.. Thanks a lot for any idea you can give..

Akshay_Hegde · October 16, 2013, 8:19am

kibou:

Hello,

I need to tail -f a file output stream and I need to get only lines that contains "get" and "point" in the same line. It doesn't matter the order.

Then I need only the text BEFORE "point".

I have to count each line and perform other serveral actions after this has performed 3 times.

I have done something like this. But I don't know how to go on.
I am currently learning about AWK. I don't know if there's a straight forward way to do this with AWK. It would be nicer to avoid "grep".
tail -f file | grep --line-buffered "get.*point\|point.*get" | awk 'BEGIN {FS="point"} {print $1}'
The tail -f output would be something like this:
Hello world. This is my point.
How to get your point?
I would like to get a point. Do you?
I do not know how to get my point.
And after processing it, it should do something to get the lines with "get" and "point". It would see something like this:
How to get your point?
I would like to get a point. Do you?
I do not know how to get my point.
Next step is extracting everything before "point":
How to get your
I would like to get a 
I do not know how to get my 
Count words in each line:
4
6
8
And quit when this is performed 3 times. So it will quit tail -f right now because it has perform it three times.

I am getting very frustrated.. Thanks a lot for any idea you can give..

try

$ awk '{print NF}' file

4
6
8

Kibou · October 16, 2013, 9:23am

Thanks for the reply.
It needs to be done to the stream coming from tail -f, not a file.

I have tried appending that at the end but it doesn't work either...

Edit-

Also, this needs to be done "on the fly" because I have to measure the time while this is done.

Thanks.

CarloM · October 16, 2013, 1:11pm

One way to do it would be to run the tail/awk pipe in a background subshell, redirect that output to a temp file, and have the parent script terminate the subshell when the file is 3 lines long.

Some rough (and untested!) code:

tempfile=$(mktemp -t XXXXXXXX.$$) || exit 1

( tail -f file.txt |  awk '/get|point/ {gsub (/point.*/,"");print NF}' | tee "${tempfile}" ) &
tailpid=$!

while true
do
        sleep 1
        found=$(wc -l "${tempfile}"|awk '{print $1}')
        [ $found -gt 3 ] && break
done

rm "${tempfile}"
kill $tailpid

Kibou · October 16, 2013, 4:57pm

Thanks so much. I have learned a lot.

Here is the final version of the script. It works like a charm!

tempfile=$(mktemp -t XXXXXXXX.$$) || exit 1


(tail -f file | grep --line-buffered "message.*@\|@.*message" | awk 'BEGIN {FS="@"} {print $1;fflush()}' | tee "${tempfile}") &
pid=$!

while true
do
    sleep 0.5

    found=`wc -l "${tempfile}"|awk '{print $1}'`

    [ $found -ge 3 ] && break

done

rm "${tempfile}"
kill $pid

I loved all those details, like the file made with mktemp. Excellent.

I just had to find out one more thing: how to output tail -f and awk. It works with the fflush() function.

Amazing. Thanks so much.

---------- Post updated at 10:57 PM ---------- Previous update was at 10:44 PM ----------

I was so happy that I forgot one more thing. But this is an easy one: counting words.

tempfile=$(mktemp -t XXXXXXXX.$$) || exit 1

words=/path/words.txt


(tail -f file | grep --line-buffered "message.*@\|@.*message" | awk 'BEGIN {FS="@"} {print $1;fflush()}' | tee "${tempfile}") &
pid=$!

while true
do
    sleep 0.5

    found=`wc -l "${tempfile}"|awk '{print $1}'`

    wc -w "${tempfile}"|awk '{print $1}' > "${words}"


    [ $found -ge 3 ] && break

done

rm "${tempfile}"
kill $pid

Regards.

RavinderSingh13 · October 16, 2013, 10:25pm

Hello CarloM,

Thanks a lot for great code. Could you please explain it.

Thanks,
R. Singh

greet_sed · October 17, 2013, 6:01am

@CarloM: May i ?
Hope my understanding is correct below:

First create a temporary file.
Second, tail the running log and awk looks for get or point string and substitute globally after the point to empty, 
pipe the number of fields after substitution to tee command , which shows the output to standard output & file created in first step.
Execute the complete second step in the background.
third one is , store the background process in to variable using shell builtin variable.
Start infinitive loop, pipe the number of lines in tempfile and print the first field using awk.
when test operator checks it greater than 3 then break the loop.

remove the temp file and kill the background process.

alister · October 18, 2013, 4:11pm

carlom:

Some rough (and untested!) code:

tempfile=$(mktemp -t XXXXXXXX.$$) || exit 1

( tail -f file.txt |  awk '/get|point/ {gsub (/point.*/,"");print NF}' | tee "${tempfile}" ) &
tailpid=$!

while true
do
   sleep 1
   found=$(wc -l "${tempfile}"|awk '{print $1}')
   [ $found -gt 3 ] && break
done

rm "${tempfile}"
kill $tailpid

That is exactly what your code does, but killing the subshell leaves each process in the tail|awk|tee pipeline alive and orphaned.

On a more general note, I can't think of any reason to ever use the following pipeline to count the lines of a single file:

wc -l file |awk '{print $1}'

wc will not print a pathname if you redirect stdin, wc -l < file . Alternatively, you can use AWK alone and print NR from an END block.

After running your script, you may want to consult with ps and, if necessary, rework your solution.

Regards,
Alister

CarloM · October 18, 2013, 5:25pm

Good point. (I did say it was untested! :))

It's not hard to correct though. Something like:

childpids=$(ps -ef | awk -vPPID=$tailpid -vORS=' ' '$3==PPID {print $2}')
kill $childpids
 kill $tailpid

alister · October 18, 2013, 9:12pm

It would be simpler to just place the monitored and monitoring code in the same process group and use kill 0 to terminate the entire group.

set -m
exec 3>&2 2>/dev/null

sh 2>&3 <<'EOF'
    tail ... | awk ... | tee ... &

    while :; do
        monitoring-commands
    done

    cleanup-commands
    kill 0
EOF

The highlighted redirections can be omitted if the job control messages aren't a bother.

However, in my opinion, a fifo-based solution is a superior approach. Since reading on a fifo blocks, there's no need for a polling while-loop and explicit sleeps.

mkfifo data-pipe

tail -f file > data-pipe &
awk 'awk-script' data-pipe
kill $!

When the awk script exits (after three matches, presumably), tail is killed.

If portability isn't a concern, (ksh or bash) coprocesses can be used instead of mkfifo.

Regards,
Alister