I need to change this data such that the third column is averaged for each different value of the first column. So I would have something like this:
30.498001,-87.881412, average
34.758781,-87.650562, average
32.498567,-86.136587, average
So for each unique value in the first column the third column is averaged. (only the first column matters because there are no two rows in my data with the same first column but different second columns0
I have tried numerous awk and Perl solutions to similar problems online. I am stumped.
If someone could point me in the right direction that would be incredible.
The first two column values are lat/long values. They are for specific locations, so considering the first column as an id is fine, I think.
I am using UNIX on a mac (not sure if that's what you needed to know). I am VERY inexperienced, only been working with UNIX for a few months.
So far I have tried switching this code around in the terminal. It is for averaging rows with the same first column.
$ awk '
NR>1{
arr[$1] += $2
count[$1] += 1
}
END{
for (a in arr) {
print "id avg " a " = " arr[a] / count[a]
}
}
' FILE
I have tried to switch the $2 to $3 and switched the order so that it would be averaging columns not rows, but I don't think this is even the right approach. No matter what I try I get "0" as the output for my average...
---------- Post updated at 05:55 PM ---------- Previous update was at 05:49 PM ----------
Assuming that the data is already in sorted order (as in your example), you could try:
awk '
BEGIN { FS = OFS = ","
}
function print_av() {
if(NR <= 1)
return
print sum / cnt
sum = cnt = 0
}
$1 != last {
print_av()
printf("%s%s%s%s", $1, OFS, $2, OFS)
last = $1
}
{ sum += $3
cnt++
}
END { print_av()
}' file
But, note as wisecracker said, you need to clean up your input data so that each line has three fields. It would probably be better to use the latitude and the longitude as in:
awk '
BEGIN { FS = OFS = ","
}
function print_av() {
if(NR <= 1)
return
print sum / cnt
sum = cnt = 0
}
$1 FS $2 != last {
print_av()
printf("%s%s%s%s", $1, OFS, $2, OFS)
last = $1 FS $2
}
{ sum += $3
cnt++
}
END { print_av()
}' file