awk last n lines of file

1in10 · June 15, 2014, 5:47pm

Just my second week working on awk I need a hint for the following tasks.
I want to limit my logfile from the very outset to 200 lines. All I do until now is

head -c 10K >> /home/uplog.txt | awk 'END{print NR " swap " NF$5; exit}' /home/uplog.txt;

After being read it shall print the very last record " some text " of the fifth field and right after that exit, due to the size of the file.

How can I set the limit to a certain number of lines?

My wiry selfmade logfile should be read only the last n lines, e.g. 10. I've been typing something like NR-1 but that gives just the line before the very last one. So how can I set a range like NR>=1&&NR<=5 or even more sophisticated /~^start !/,/~^stop !/ ?
If someone can give me a hint that would be great, thanks in advance.

Don_Cragun · June 15, 2014, 6:03pm

Is this a homework assignment? If so, please repost as directed here: Rules for Homework & Coursework Questions Forum

RudiC · June 15, 2014, 6:06pm

Not sure I understand the logics of your code snippet. After appending stdout of the head command to /home/uplog.txt, you pipe stdout (which will be empty) to awk 's stdin but make awk read the recently appended to file /home/uplog.txt at the same time? That can't work.
Does it have to be awk ? Then you need e.g. a circular buffer that you print in the END section. Did you try the tail command?

1in10 · June 15, 2014, 6:10pm

It is not a homework, I am not a student of IT, surely just for my machine here. Yes, there are people out there who do scripts for themselves, just like me. And I've been searching on various sites to find a solution.

---------- Post updated at 07:10 PM ---------- Previous update was at 07:07 PM ----------

It is not a homework, I am not a student of IT, surely just for my machine here. Yes, there are people out there who do scripts for themselves, just like me. And I've been searching on various sites to find a solution. I tried that tail-command as well. But I'd need that one as doing a step ahead in awk.

RudiC · June 15, 2014, 6:18pm

OK, so what about the circular buffer?

Don_Cragun · June 15, 2014, 6:42pm

I agree with RudiC. Your code:

head -c 10K >> /home/uplog.txt | awk 'END{print NR " swap " NF$5; exit}' /home/uplog.txt;

doesn't seem to be related to what you said you're trying to do. This code (depending on what OS you're using) will give you a diagnostic for an invalid head -c option-argument, a diagnostic saying that head doesn't have a -c option, or append the 1st 10000 or 10240 characters from standard input for this script to the end of /home/uplog.txt while simultaneously having awk read whatever it finds in /home/uplog.txt and then print the number of lines it found in the file (before head started adding data to it, at some point in time while head is writing to it, or after head has finished writing to it) followed by the string swap followed by (depending on what OS you're using) the number of fields in the last record in the file and the contents of the 5th field from the last line of the file, nothing, or one but not both of those values.

Your requirements are ambiguous.

Limiting a file to 200 lines from the outset is not the same thing as reading it at some later time and discarding all but the 1st 10k characters (without checking for line boundaries).

Please give us a clear English description of what you are trying to do.

1in10 · June 15, 2014, 6:58pm

@Don Cragun
Much ambition means many errors, no errors means no trouble at all, I agree that this is a task for me.
From the very outset I want to set this file to just 200 entries, nothing more. And catch the last seven and in a furhter step the last thirty lines of the fifth field for a calculation. The text-string " swap " could be any other. BTW I am not a pro so: I do not know anything about circular buffers, excuse me.
My OS here is debian wheezy 7.5, no server.
So I cut out the first statement do direct it after the awk-statement to the stdout. As you may see, this could be a beginner, but I assure you I am right in the middle, because this is my third week around with awk.

Don_Cragun · June 15, 2014, 7:16pm

We don't care if you're a beginner. As long as you want to learn, we want to help you.

But, to help you we need to understand what you're trying to do.

You want to

??? (The 1st 200 entries? The last 200 entries? What constitutes an entry?)

I don't understand what you mean by:

Show us a sample of your input file. Give us details about the format of this file, the size of the file, the field separators in the file, etc.

Explain to us in detail in English what you want to do to that input.

Show us a sample of the output you want your script to produce.

1in10 · June 15, 2014, 9:39pm

#!/bin/bash 

#path=/home/Desktop/bashes
machine=$(uname -n);
R=`date +%A'  '%d'/'%m'/'%y' die '%V'.'week`;
V=`date +%x`              #will be used later
T=$((86400/3600));          #will be used later    

echo $T "not yet";

# shows me the actual user
echo $USER;

if (( "$T" < 31 ))
then 
    echo "today is" $R 
    else :
fi;

# I do know that this notation below of string4 and string5 is not the pretty version, but I want to keep it!!!

string4=`uptime`
string5=`date +%x`

echo "uptime" $USER "an" $machine " " ${string4:13:5} " " ${string5}  | head -c 10K >> /home/uplog.txt | awk  'END{print NR " full " NF$5; exit}' /home/uplog.txt

This is the the whole script. Indeed I just want to have a maximum of lines of 200. Nothing else.
If I cut out the first time >> /home/uplog.txt the file won't be updated. So it may remain. I tested it without and it stopped at that line. What gives me some hope was an old thread right here that was turning the file upside down.

This script shall keep the last 200 times of uptime of a user, that simple. Furthermore I want to fetch the last seven and the last thirty entries of it for a calculation, average uptime and total uptime.
Beyond this I want to switch after a certain value of uptime my MAC-address or to make a redial.
@Don Cragun sure I am willing to learn, I think that keeps me afloat. For not having at least the five posts here I have to wait to send the link. The user that gave that hint is cfajohnson and his answer dates back to 2007.

Don_Cragun · June 16, 2014, 12:37am

You don't need 5 posts to cut and paste sample data into a post (just like you did with your code). It looks like you're saying you have a file (in a very strange place unless you're running as root) that contains lines like:

uptime username an   ays,   MM/DD/YYYY
uptime username an   days,   MM/DD/YYYY
            or
uptime username an   N day   MM/DD/YYYY

depending on how long many days the machine has been up (where username, MM, DD, and YYYY are obvious and N is the last digit in the number of days the system has been running if the system has been up for more than 999 days). (Of course, your version of uptime may print something else in the 5 characters starting in position 15.)

And, this would mean the output from awk could be something like:

325 full 506/15/2014
       or
325 full 6day

assuming you had collected 324 samples before you ran the script and that awk saw the last sample you added to the log using echo and head .
Note that the head in this script is a fairly expensive no-op. And, the awk can be replaced by a set and an echo .

I repeat. Please tell us in English what you are trying to do!
Show us sample input!
Show us desired output!

Keeping only the last 200 lines in your log file can be done by:

(tail -n 199 /home/uplog.txt
echo "uptime" $USER "an" $machine " " ${string4:13:5} " " ${string5}) > newuplog.txt && mv newuplog.txt /home/uplog.txt

(assuming that you are running this in a directory that is on the same filesystem as /home ).

If this is code being run by a normal user, I would have expected it to use $HOME/uplog.txt rather than /home/uplog.txt .

How is capturing 7 or 30 copies of the word day possibly mixed with dates of the form MM/DD/YYYY going to help you calculate average or total uptime? This makes no sense to me. PLEASE SHOW US SAMPLE DATA!

1in10 · June 16, 2014, 3:06pm

My aim is just to set a maximum of records or entries to that file, that very maximum shall be 200 lines. And I want to extract that specific seven and thirty last entries for that average-value and the sum of each of them. That means the sum of the last seven as well as their average. The same calculation for the last thirty entries.
By now the input is done every time executing the script in the interpreter. It adds one line to the file. This should be done when shutting down the computer, because it is not a server. Therefore the script will be placed in /etc/rc0.d/ with a k-link. And while trying to figure it out it has more than 3400 lines. Yes, I do use root for that purpose. So there is no strange place anyhow for none of the files.
While trying as well the command of "tac" or "sort -nrk5" or similar ones, I want to go on with the upside-down-example given by the user cfajohnson shown in this code snippet below.

 awk '{x[NR] = $0}
  END { while ( NR > 0 ) print x[NR--] }' /home/uplog.txt;

Assuming to find a solution with NR==1,NR==7 for the range of one calculation e.g.

 awk '{sum=sum+$5} END {print sum}' /home/uplog.txt

and

 awk '{sum=sum+$5} END {print sum/NR} /home/uplog.txt

for the average value and the sum of that row.
I suppose this should even work with both targets, the range of
the first seven (after turning it upside down) and the first thirty values. Even in that format of dd/mm/yyyy. So far string4 is shown from position 13 five digits on.
The output so far is the last line of the file, as shown below.

 24 not yet
sandy
Today is monday  16/06/14 the 25.th Week
99 full 61:52

This output is just adapted to english. But the date-format remains the same dd/mm/yyyy. As in this example the script on this machine has been running 99 times, the 24 (hours) for the user (in this case sandy) are not completed, the date plus the week, 99 lines and the total value.

Don_Cragun · June 16, 2014, 4:42pm

1in10:

My aim is just to set a maximum of records or entries to that file, that very maximum shall be 200 lines. And I want to extract that specific seven and thirty last entries for that average-value and the sum of each of them. That means the sum of the last seven as well as their average. The same calculation for the last thirty entries.
By now the input is done every time executing the script in the interpreter. It adds one line to the file. This should be done when shutting down the computer, because it is not a server. Therefore the script will be placed in /etc/rc0.d/ with a k-link. And while trying to figure it out it has more than 3400 lines. Yes, I do use root for that purpose. So there is no strange place anyhow for none of the files.
While trying as well the command of "tac" or "sort -nrk5" or similar ones, I want to go on with the upside-down-example given by the user cfajohnson shown in this code snippet below.
 awk '{x[NR] = $0}
  END { while ( NR > 0 ) print x[NR--] }' /home/uplog.txt;
Assuming to find a solution with NR==1,NR==7 for the range of one calculation e.g.
 awk '{sum=sum+$5} END {print sum}' /home/uplog.txt
and
 awk '{sum=sum+$5} END {print sum/NR} /home/uplog.txt
for the average value and the sum of that row.
I suppose this should even work with both targets, the range of
the first seven (after turning it upside down) and the first thirty values. Even in that format of dd/mm/yyyy. So far string4 is shown from position 13 five digits on.
The output so far is the last line of the file, as shown below.
 24 not yet
sandy
Today is monday  16/06/14 the 25.th Week
99 full 61:52
This output is just adapted to english. But the date-format remains the same dd/mm/yyyy. As in this example the script on this machine has been running 99 times, the 24 (hours) for the user (in this case sandy) are not completed, the date plus the week, 99 lines and the total value.

Obviously your version of uptime produces significantly different output than uptime on the laptop I have running OS X. If you continue to refuse to show us sample data from /home/uplog.txt I can't help you any more.

How have you determined that the 24 (hours) for the user (in this case sandy) are not completed? The uptime utility reports how long a system has been running and what recent load averages are. It says absolutely nothing about how long sandy or any other user has been logged in. And, the last line of your output seems to show that this machine has been running for almost 62 hours.

You don't need tac to get the last 7 or 30 lines. You definitely don't want to use sort -nrk5 if you're trying to process the last 7 or 30 lines of your input file. If this script is being run by root, why does $USER expand to sandy ?

It is nice that you have learned how to emulate tac using awk , but unless there is some reason why you want to reverse the lines in your log file, that isn't what you need for this project.

If the 5th field in /home/uplog.txt is hours and minutes separated by a colon, sum+=$5 isn't even going to come close to doing what you want. In awk the command sum+=$5 will keep a running sum of integer or floating point values; it won't sum up values given as hours and minutes.

Thank you for showing us what your script produces. But, we know that isn't the output you want. So, please:

Show us the output from the uptime command on your system.
Show us what the last 35 lines are in /home/uplog.txt!
Show us exactly what output you want to have produced from those 35 lines.

If you'll do that for us, we'll show you how to use circular buffers in awk to save the last 200 lines of your input file, to get sums and/or averages from the last 7 lines in your input file, and how to get sums and/or averages from the last 30 lines in your input file.

1in10 · June 17, 2014, 9:22pm

@Don Cragun
Eagle eye Don Cragun was right, my first attempt worked but it is confusing. I corrected the part to the following variables and their output. But I still do not agree with your point of view of using "head -c". All I could find on that was ulimit for setting a limit. So I let that first "head -c".
Thanks for pointing at that first mess. I was right focused on that awk.

machine="Today at: ";
machine=$(printf "%s %s" "$machine" "$(uname -n)");

and

stringZ=`uptime`
stringZ=$(printf "%s %s" "$stringZ" `date +%x`);

That gives me the following output

24 not yet
sandy
uptime sandy   Today at: jarbo3   2:45

While jarbo3 is the name of the computer.
This is the actual output of that specific file, I deleted the old one, due to that confusion. It's without that awk-output yet.

22:00:55 up 2:45, 3 users, load average: 0.02, 0.04, 0.00 17.06.2014

I guess after clearing that, I can get a coffee to spend the time with awk. Thanks for being so harsh.
There is no specific reason for making a difference between user or root on this computer here, since I am the only user. And yes, the 24 hours will be changed in the ongoing of that script. When the uptime hits a certain value, e.g. 3 hours, it shall trigger a redial and a change of the MAC-address as well.

Don_Cragun · June 18, 2014, 1:49am

Please trust me. The:

| head -c 10K

in:

echo "uptime" $USER "an" $machine " " ${string4:13:5} " " ${string5}  | head -c 10K >> /home/uplog.txt

isn't doing anything but slowing down your script. If you replace that with:

echo "uptime" $USER "an" $machine " " ${string4:13:5} " " ${string5} >> /home/uplog.txt

it will produce the same output, but do it faster. We'll take care of your desired maximum number of lines to be kept later in your awk script.

We still need to see some sample lines from /home/uplog.txt and we still need to see an exact sample of the output you want to get when processing that input. (Note that you don't have to wait 24 hours between invocations of your script. If you invoke it once per minute for a half hour, you'll have enough data in your file to compute 7 and 30 entry sums and averages.) Until you show us sample data from that file, we can't help.

1in10 · June 18, 2014, 11:04am

@Don Cragun
Allright, I will drop that part of head -c. I am learning.
Surely there is some faster way to produce that uplog.txt with the sufficent lines.
So here is what the script zetzwo.sh displays on the screen:

sandy@jarbo3:~/Desktop$ ./zetzwo.sh
24 not yet
sandy
uptime sandy   Today at: jarbo3   3:20

And here comes the thirty entries of the uplog.txt
Seeing it now, I would like just to hang on to awk. For several reasons. Recently it helped me to find duplicates in a huge file much faster than any bash-command. It was about a file with more than 3 million entries. And to advance a bit in (g)awk.
As you asked for the last thirty lines or entries, here they are.

09:04:40 up 33 min, 2 users, load average: 0.00, 0.11, 0.10 18.06.2014
09:04:42 up 33 min, 2 users, load average: 0.00, 0.10, 0.10 18.06.2014
09:04:43 up 33 min, 2 users, load average: 0.00, 0.10, 0.10 18.06.2014
09:05:26 up 33 min, 2 users, load average: 0.08, 0.11, 0.10 18.06.2014
09:05:27 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:27 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:28 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:28 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:28 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:29 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:29 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:29 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:30 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:30 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:30 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:31 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:31 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:32 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:33 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:33 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
11:47:25 up 3:15, 2 users, load average: 0.05, 0.10, 0.05 18.06.2014
11:47:26 up 3:15, 2 users, load average: 0.05, 0.10, 0.05 18.06.2014
11:47:27 up 3:15, 2 users, load average: 0.05, 0.10, 0.05 18.06.2014
11:47:28 up 3:15, 2 users, load average: 0.05, 0.10, 0.05 18.06.2014
11:47:29 up 3:15, 2 users, load average: 0.04, 0.10, 0.05 18.06.2014
11:47:30 up 3:15, 2 users, load average: 0.04, 0.10, 0.05 18.06.2014
11:47:30 up 3:15, 2 users, load average: 0.04, 0.10, 0.05 18.06.2014
11:47:31 up 3:15, 2 users, load average: 0.04, 0.10, 0.05 18.06.2014
11:47:31 up 3:15, 2 users, load average: 0.04, 0.10, 0.05 18.06.2014
11:47:36 up 3:15, 2 users, load average: 0.04, 0.10, 0.05 18.06.2014
11:52:10 up 3:20, 2 users, load average: 0.00, 0.03, 0.02 18.06.2014

Don_Cragun · June 18, 2014, 2:17pm

And what 7 and 30 line sums and averages do you want your script to produce from these 30 lines from uplog.txt???

It appears that the last field in your output from zetzwo.sh :

uptime sandy   Today at: jarbo3   3:20

comes from the third field in the last line of uplog.txt:

09:04:40 up 33 min, 2 users, load average: 0.00, 0.11, 0.10 18.06.2014
09:04:42 up 33 min, 2 users, load average: 0.00, 0.10, 0.10 18.06.2014
... ... ...
11:47:36 up 3:15, 2 users, load average: 0.04, 0.10, 0.05 18.06.2014
11:52:10 up 3:20, 2 users, load average: 0.00, 0.03, 0.02 18.06.2014

Is that the field you want to sum and average? We can see from this that the output from uptime on your system varies in this field depending on how long your system has been running. Presumably it uses

n min

for up times less than one hour and h:mm for up times of one hour or more. We saw something in an earlier post where what I assume was this value was 61:52 , so I assume that the format doesn't change for larger numbers of hours. Is this correct? (On my system, the format changes when you get to 24 hours:

10:47  up 5 days, 13:21, 8 users, load averages: 1.32 1.36 2.49

which I'm guessing your system does not do.) We can deal with issues like this if we know what input we'll be getting. But, if we don't know the format of the data we'll be processing, we can't successfully convert your system's up time to a number of minutes (an integer value) or a number of hours (a floating point value with minutes as a fractional part of an hour). And, we need to know the ranges of values we're going to be handling to determine whether we should be using integer or floating point values. (Assuming you're using a system with at least a 32-bit signed long int, and that your average system up times will always be less than 135 years, we can use integer arithmetic.)

1in10 · June 18, 2014, 4:39pm

Probably the next generation of ssd and nand-chips gets close to that value of MTBF of a 135 years and my digital inheritance does not matter to nobody. I do understand your point of view about the kind of value to be calculated. So I keep going with my ambition to solve it in awk. Thanks a lot really. Regards.

Don_Cragun · June 18, 2014, 4:55pm

We'll be happy to help you with the awk code (including how to trim the output to a given number of lines from the end of the input file) if you'll just show us what 7 and 30 entry sums and averages you're trying to produce (and verify the formats that the uptime utility produces on your system for various amounts of time the machine has been running).

1in10 · June 18, 2014, 8:51pm

Okay, I keep it up in this thread, because the original request or task remains unsolved.
Having now a neat database as shown above, I want to use awk to

count the file just up to the line 200 and write to /dev/null above that number of lines.
sort it upside down (what equals in bash tac, sort -r) for there is no constant number of lines.
pick up the first seven lines for the calculation of sum and sum/NR (the average value).
repeat step 3 for the first thirty lines for the same calculation.
5 print the result of step 3 and step 4 to stdout or a file (in case I want to use later with xmessage).

My first attempt is to limit the number of lines or NR already written to that file uplog.txt. After doing so I think it gets easier to handle it. Or just to cut it down to the right size.
What I know about awk-structure is, it needs this:

pattern { action statements }
  function name(parameter list) { statements }

Step one and my attempt just runs into an syntax error.
The code is the following:

awk 'END {for NR > 181 printf > "/dev/null"}' /home/uplog.txt;

Giving me

 END {for NR > 181 printf > "/dev/null"}
awk: commandline  :1:          ^ syntax error
awk: commandline  :1: END {for NR > 181 printf > "/dev/null"}
awk: commandline  :1:                   ^ syntax error

I'd say pattern is "file has been read until the END" and the {action statement} in this case is to print all line higher than 180 to /dev/null. I see that the counter is missing.

And here follows a sample of the last entries to that uplog.txt. In the beginning of this thread it was about field number five, what turns out to be the first three now. Never mind the values, I changed the computer.

21:46:35 up 3:42, 2 users, load average: 0,19, 0,22, 0,29 18.06.2014
21:47:29 up 3:43, 2 users, load average: 0,08, 0,18, 0,27 18.06.2014

Any hints? This time I need to learn, to advance a bit. Thanks in advance.

Don_Cragun · June 18, 2014, 10:58pm

I repeat: If these are the last 7 lines in uplog.txt :

09:05:31 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
09:05:33 up 33 min, 2 users, load average: 0.07, 0.10, 0.10 18.06.2014
11:47:30 up 3:15, 2 users, load average: 0.04, 0.10, 0.05 18.06.2014
11:47:31 up 3:15, 2 users, load average: 0.04, 0.10, 0.05 18.06.2014
11:47:31 up 3:15, 2 users, load average: 0.04, 0.10, 0.05 18.06.2014
11:52:10 up 3:20, 2 users, load average: 0.00, 0.03, 0.02 18.06.2014
22:15:16 up 61:52, 2 users, load average: 0.23, 0.44, 0.30 20.06.2014

EXACTLY what output do you want for the sum and average. Do you want the results as an average number of minutes; hours and minutes; hours, minutes, and seconds; days, hours, minutes, and seconds? If you refuse to show us the format of the output you want, how can we write code that will give you what you want? Please help us help you. Show us what you want!

Other than x mins (as in 33 mins ) and h:mm (as in 3:20 and 61:52 ), is there any other format that your version of uptime produces for the values you want to sum and average? If you won't tell us what format the data is in that we are processing, how can we process that data? Please help us help you. Confirm what data formats appear in the input you want to process!

PLEASE ANSWER THE QUESTIONS I'M ASKING! Let us help you! I think I understand which lines you want to process. And, tac would be a waste of time and sort would not even come close to giving you what you want (1st because the data format for the data you're sorting is not consistent and 2nd because sort will give you the wrong lines to process if your machine is rebooted at any time during the period covered by the 200 lines you want to keep in uplog.txt ).