[Solved] Using awk to calculate max value

cparr · February 7, 2014, 1:29pm

I have a file of sites and each site has a variable number of flow values with a date for each value. I want to determine the max value of flow for each site and output the site number, max value, and date of max value.The format structure (simplified) is the following:

Record  Site Number  Date Flow Value
 V      050100
 3      050100   6/17/1928    145
 3      050100  5/28/1929     500
 .
 .
 3     050100    6/5/2012     450
 V     06200
 3     06200      7/3/1945     1256
 3     06200      6/8/1950      835
 .
 .
 3    06200      5/28/1999    287

Thanks for your help!!

bartus11 · February 7, 2014, 1:36pm

What would be a desired output for this sample data?

cparr · February 7, 2014, 3:18pm

The output would look the following:

050100  5/28/1929     500
062000  7/1/2001     1500
.
.

where the values shown for sites 050100 and 06200 are the max values for the entire set of flows for each site. the number of flow values for each site vary from 10 to over 100.

ahamed101 · February 7, 2014, 5:14pm

Something like this

awk '!/^Rec/ && NF==4 && a[$2]<$NF{ a[$2]=$NF; d[$2]=$(NF-1) } END{ for(i in a)print i,a,d }' input_file

--ahamed

Don_Cragun · February 8, 2014, 12:52am

cparr:

The output would look the following:
050100  5/28/1929     500
062000  7/1/2001     1500
.
.
where the values shown for sites 050100 and 06200 are the max values for the entire set of flows for each site. the number of flow values for each site vary from 10 to over 100.

Given that your sample input does not have any entries for Site Number 062000, does not have any entries with Date 7/1/2001, and does not have any entries with Flow Value 1500, I don't understand how you would expect to get the 2nd line of output shown above.

If the output you wanted from your sample input had been:

050100  5/28/1929     500
06200   7/3/1945     1256

you could use something like the following awk script:

awk '
function prmax() {
	if(m != "") printf("%-8s%-10s%7d\n", s, d, m)
	m = ""
}
$1 == "V" {
	prmax()
	s = $2
	next
}
NF == 4 && m < $4 {
	d = $3
	m = $4
}
END {	prmax()
}' file

Unlike the script provided by ahamed101, the output will be in the same order as the input file. (The for loop used by ahamed101 prints entries from the array in an unspecified order.) The above code prints the max flow value for the previous site number as soon as it finds the first line of the next site number (so it also uses less memory).

If you want to try this script on a Solaris/SunOS operating system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of the default /usr/bin/awk .

cparr · February 8, 2014, 10:50pm

Thank you ahamed101 and Don! As you correctly surmised Don, the output I was after was the max value for each site with the output lines displayed as you indicated. My sample output was confusing because I was trying to show with the dots that actual number of flow values was greater than the 3 or 4 lines I showed for each site. In any case, your awk script worked perfectly, Don--many thanks!