Extract info from sar output

Hi

I have an output of sar command which is as follows:

10:22:18 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
10:23:18       0     398     100       5      13      64       0       0
10:24:18       0     332     100       5      15      65       0       0
10:25:18       0     301     100       6      17      67       0       0
10:26:18       0     309     100       5      16      69       1       0
10:27:18       0     178     100       5      13      61       0       0
10:28:18       0     118     100      14      17      16       0       0
10:29:18       0     189     100      10      16      35       0       0
10:30:18       0     249     100       5      20      74       0       0
10:31:18       0     147     100      14      21      32       1       0
10:32:18       0     325     100       6      14      59       0       0
10:33:18       0     570     100       6      15      60       0       0
10:34:18       0     734     100       6      16      62       0       0
10:35:18       0     704     100       6      22      70       0       0
10:36:18       0     718     100       6      15      59       0       0
10:37:18       0     794     100       6      15      59       0       0
10:38:18       0     796     100       5      12      57       0       0
10:39:18       0     739     100       5      11      51       0       0
10:40:18       0     714     100       6      17      65       0       0
10:41:18       0    1281     100      18      21      17       0       0
10:42:18       0     700     100       6      17      64       1       0
10:43:18       0     720     100       5      14      60       0       0
10:44:18       0     846     100       6      15      60       0       0
10:45:18       0     799     100       6      19      67       0       0
10:46:18       0     663     100       6      15      62       0       0
10:47:18       0     710     100       6      17      64       1       0
10:48:18       0     642     100       6      12      53       0       0
10:49:18       0     713     100       6      13      56       0       0
10:50:18       0     746     100       6      18      69       0       0
10:51:18       0    1241     100      17      22      23       0       0
10:52:18       0     773     100       6      18      68       1       0

Average        0     605     100       7      16      56       0       0

and I want to extract the 1st, 4th and 7th column but adding before the hour column another column with 1, 2, 3, and so on like:

1
2
3
4
5
6
7

I have try to use the following piece of code:

cat sarb.out | \
awk 'BEGIN {LINE=1; printf "#Spl Date     %%rcache %%wcache\n"; }
 { if ( NF == 9 ) {
  if($1 != "Average") DATE=$1
  if($1 == "Average") {
   printf "%4d %s %7d %7d\n", LINE, DATE, $4, $7
   LINE++}
   }
 }'

but my output is only one line.

please can you help

---------- Post updated at 12:13 PM ---------- Previous update was at 11:37 AM ----------

my expected output should be:

	10:22:18          %rcache          %wcache 
1	10:23:18            100             64       
2	10:24:18            100             65       
3	10:25:18            100             67      
4	10:26:18            100             69       
5	10:27:18            100             61       
6	10:28:18            100             16     
7	10:29:18            100             35       
8	10:30:18            100             74       
9	10:31:18            100             32       
10	10:32:18            100             59      

Hello fretagi,

Could you please try following and let us know if this helps.

awk '($1 ~ /Average/){next} NF==9{print A OFS $1 OFS $4 OFS $7;A++}' OFS="\t\t"  Input_file

Output will be as follows.

                10:22:18                %rcache         %wcache
1               10:23:18                100             64
2               10:24:18                100             65
3               10:25:18                100             67
4               10:26:18                100             69
5               10:27:18                100             61
6               10:28:18                100             16
7               10:29:18                100             35
8               10:30:18                100             74
9               10:31:18                100             32
10              10:32:18                100             59

Thanks,
R. Singh

1 Like

Hi Singh!
thank you very much, you got it right. But please can explain the piece of code!

Hello fretagi,

Following may help you in same.

awk '($1 ~ /Average/){next}            ##### Checking condition if $1's value has string Average then do not perform any action further.
NF==9                                  ##### if Number of fields are 9 then
{print A OFS $1 OFS $4 OFS $7;A++}'    ##### print A which is variable it will print NOTHING on first line as it has NULL value at first, later I am increasing it.
                                             Also print $1, $4 and $7 with OFS, output field seprator's value too.
OFS="\t\t"  Input_file                 ##### Setting OFS(Output field seprator)'s value to double tab and mentioning the input file.
 

Thanks,
R. Singh

1 Like

Could this be simplified in structure rather? I generally don't like cats, but this one might make sense:-

cat -n sarb.out | while read seq tim bre lre rca bwr lwr wca pre pwr
do
   printf "$seq\t$tim\t$rca\t$wca\n"
done

Does that make more sense? i struggle with awk & sed so this may be clearer to you, maybe not. It's just an alternate that might make it easier to maintain in future, however it will likely run slower than a single well written awk

By way of explanation

  • The cat -n prefixes each line with a sequence number
  • The while read loop reads each field from each line (including the sequence
  • The printf displays the fields you want, tab separated by \t and throwing a new-line at the end \n

I hope that this helps,

Robin

1 Like

Thank you Robin for nice code. Just want to add here above code will give empty line and line which contains Average string in it and count(seq) should start from 2nd line of input_file(sar_input in my case). So a little edited one of your code as follows.

cat -n sar_input | while read seq tim bre lre rca bwr lwr wca pre pwr
do
      if [[ $seq -gt 1 && "$tim" != "Average" && $rca != "" && $wca != "" && "$tim" != "" ]]
      then
           seq=`expr $seq - 1`
           printf "$seq\t$tim\t$rca\t$wca\n"
      fi
done
 

Thanks,
R. Singh

1 Like

Hello fretagi,
The awk script you started seems to have been pretty close to what you wanted. With a couple of minor tweaks:

cat sarb.out | \
awk 'BEGIN {LINE=1; printf "#Spl Date     %%rcache %%wcache\n"; }
 { if ( NF == 9 ) {
  if($1 == "Average" || NR == 1) next
  DATE=$1
  printf "%4d %s %7d %7d\n", LINE, DATE, $4, $7
  LINE++}
 }'

it produces the output:

#Spl Date     %rcache %wcache
   1 10:23:18     100      64
   2 10:24:18     100      65
   3 10:25:18     100      67
   4 10:26:18     100      69
   5 10:27:18     100      61
   6 10:28:18     100      16
   7 10:29:18     100      35
   8 10:30:18     100      74
   9 10:31:18     100      32
  10 10:32:18     100      59
  11 10:33:18     100      60
  12 10:34:18     100      62
  13 10:35:18     100      70
  14 10:36:18     100      59
  15 10:37:18     100      59
  16 10:38:18     100      57
  17 10:39:18     100      51
  18 10:40:18     100      65
  19 10:41:18     100      17
  20 10:42:18     100      64
  21 10:43:18     100      60
  22 10:44:18     100      60
  23 10:45:18     100      67
  24 10:46:18     100      62
  25 10:47:18     100      64
  26 10:48:18     100      53
  27 10:49:18     100      56
  28 10:50:18     100      69
  29 10:51:18     100      23
  30 10:52:18     100      68

I would get rid of the cat (which only creates more work for your system and slows down your output), rewrite the if statements as a condition (to shorten your code), and get rid of the LINE and DATE variables (since they aren't needed) like this:

awk 'BEGIN {printf "#Spl Date     %%rcache %%wcache\n"}
NF == 9 && NR > 1 && $1 != "Average" {
  printf "%4d %s %7d %7d\n", NR - 1, $1, $4, $7
}' sarb.out

and it still produces exactly the same output. I would probably change Date in the heading output line to Time , but I like the text heading better than copying the timestamp from the input header. But, that choice is clearly up to you.

Improving on RavinderSingh13's solution to handle the edge cases.

cat sarout.txt |awk ' BEGIN { OFS="\t\t"; print "Seq","Time","rcache","wcache"; count=1}; ( NR==1 || $1 ~ /^Average/ || $NF < 0 ) { next } ; { print count,$1,$4,$7; count++ }'