Bash script max value per hour

fajar_3t3 · October 15, 2017, 8:14pm

i want to get max value every hour

sample input :

20:46:22 23
20:46:23 65
20:46:24 30
20:46:25 7
21:46:26 23
21:46:27 28
21:46:28 47
21:46:29 35
22:46:30 5
22:46:31 38
22:46:32 26
22:46:33 19
23:46:34 7
23:46:35 6
23:46:36 3
23:46:37 10

expected :

20:46:23 65
21:46:28 47
22:46:31 38
23:46:37 10

Can somebody help me on this ?

garydeena · October 15, 2017, 8:30pm

sort  -nr  -k2  filename

Scott · October 15, 2017, 8:54pm

This should be possible with just the sort command, but:

$ sort -nrk2 somefile | awk -F: '!A[$1]++'
20:46:23 65
21:46:28 47
22:46:31 38
23:46:37 10

fajar_3t3 · October 15, 2017, 9:03pm

Hi Scott

Thanks for your reply , solved this case with your command :

cat test.txt | sort -nrk2 | awk -F: '!A[$1]++' | sort -nk1
00:07:15 139
01:00:05 89
02:01:50 58
03:27:07 132
04:03:10 140
05:43:11 161
06:41:36 174
07:37:46 194
08:46:59 213
09:35:15 229
10:02:13 340
11:11:47 268
12:20:00 229
13:32:40 258
14:34:40 205
15:52:56 203
16:41:09 186
17:52:24 235
18:47:05 304
19:53:10 266
20:07:37 196
21:04:48 193
22:03:50 154
23:14:05 148

Thanks

RavinderSingh13 · October 15, 2017, 11:09pm

Hello fajar_3t3,

Following may help you too in same.

awk -F':| '  '{a[$1]=a[$1]>$NF?a[$1]:$NF;b[$1]=$1":"$2":"$3} END{for(i in a){print b,a}}'   Input_file

Output will be as follows.

20:46:25 65
21:46:29 47
22:46:33 38
23:46:37 10

Thanks,
R. Singh

Don_Cragun · October 16, 2017, 2:33pm

Hi Ravinder,
Note that for(i in a) selects elements from array a in an unsepcified order. So, the output from your script won't necessarily be displayed in increasing time order (even if the input is in sorted order).

Hi fajar_3t3,
If you're going to call sort twice, there is no need to also invoke cat and awk . The command:

sort -k2,2nr test.txt | sort -t: -k1,1n -u

should produce the same output as the code you showed is in post #4 in this thread and run a little bit faster.

If your input file is in increasing time order (as shown in your sample in post #1), you could also try the single awk command:

awk -F '[: ]' '
function PrintHigh() {
	if(NR > 1)
		print HighLine
	SaveHigh()
}
function SaveHigh() {
	Hour = $1
	HighLine = $0
	HighValue = $NF
}
NR == 1 {
	SaveHigh()
	next
}
$1 != Hour {
	PrintHigh()
	next
}
$NF > HighValue {
	SaveHigh()
}
END {	PrintHigh()
}' test.txt

which should be still faster since only one process is invoked and the input is read only once and the hourly low-valued lines aren't written at all.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

Scott · October 16, 2017, 2:54pm

Argh. I tried that with two sorts after spending ages trying to do it with one!

sort -k2nr file | sort -t: -uk1

missed a -n (and arguably ending fields on the -ks), so gave up and reverted to the old favourite, awk

Nice one, Don!

jgt · October 17, 2017, 8:23am

If the file is sorted, and the data exceeds 24 hours, then the results will show only the maximum value for an hour on any day rather than the maximum for each hour on every day.

MAX=0
PREV_HR=25
while read time count
do
hour=${time:0:2}
if ( $PREV_HR -eq 25 )
then
      PREV_HR=$hour
fi
if ( $hour -ne $PREV_HR )
then
    echo $PREV_HR $MAX
    PREV_HR=$hour
    MAX=0
fi
if ( $count -gt $MAX )
then
    MAX=$count
fi
done
echo $PREV_HR $MAX