I have the following format of input from multiple files
File 1
24.01 -81.01 1.0
24.02 -81.02 5.0
24.03 -81.03 0.0
File 2
24.01 -81.01 2.0
24.02 -81.02 -5.0
24.03 -81.03 10.0
I need to scan through the files and when the first 2 columns match I need to print out the first two columns and the average of the third.
Expected output:
24.01 -81.01 1.5
24.02 -81.02 0.0
24.03 -81.03 5.0
I was using this code but I am not getting the expected output (data are wrong).
awk -F, '{array[$1 $2]+=$3} END { for (i in array) {print i array}}' input.file
I think I need to remove the -F, (since there are no commas) and divide the array by number of records (NR).
Any help is appreciated
awk '{array[$1 $2]+=$3; a[$1 $2]++} END { for (i in array) {print i, array/(a)}}' file1 file2 > newfile
Try that for a start.
1 Like
Try
awk '{arr[$1,$2] += $3
count[$1,$2] += 1
}
END{
for (a in arr) {
print a arr[a] / count[a]
}
}
' file*
The results are as expected, but the format is not. What is the command to preserve the original format.
Try this modification:
Any awk:
awk '{v=$NF; sub(/ +[^ ]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A/C}' file1 file2
GNU awk:
gawk '{A[$1$2]+=$3; C[$1$2]++} END{for(i in A) printf "%s%7.1f\n",i,A/C}' FIELDWIDTHS="6 8 9" file1 file2
1 Like
[quote=scrutinizer;302933668]
Try this modification:
Any awk:
awk '{v=$NF; sub(/ +[^ ]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A/C}' file1 file2
Great! The first awk statement worked.
Now Im looking for the correct way to sort the first column (ascending).
Try running it through sort :
awk ... | sort -n
1 Like
scrutinizer:
Try this modification:
Any awk:
awk '{v=$NF; sub(/ +[^ ]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A/C}' file1 file2
I was hoping to modify your script to subtract a field (3rd column) in File A from another field (3rd column) in File B, but its not working as I had thought. I am not familiar enough with Awk to make it work.
Any help is appreciated!
File A
24.01 -81.01 1.0
24.02 -81.02 5.0
24.03 -81.03 0.0
File B
24.01 -81.01 2.0
24.02 -81.02 3.0
24.03 -81.03 0.0
Result Output
24.01 -81.01 -1.0
24.02 -81.02 2.0
24.03 -81.03 0.0
RudiC
February 2, 2015, 9:50am
9
If you have just two files, and if you want to subtract file2's values from file1's,and if you accept some format change, this might help you:
awk 'FNR==NR {A[$1,$2]=$NF; next} {A[$1,$2]-=$NF} END{for(i in A) printf "%s%7.1f\n",i,A}' SUBSEP=" " file1 file2
24.01 -81.01 -1.0
24.02 -81.02 2.0
24.03 -81.03 0.0
1 Like
Thanks to all for the help.
I have 1 final question.
I need to perform the following calculation on column 3.
awk '{$3 = ($3*9)/5; print}'
Input
46.0625 -99.1042 4.9
35.4792 -79.6042 -0.6
42.5208 -71.4792 -1.7
Expected Output
46.0625 -99.1042 8.8
35.4792 -79.6042 -1.1
42.5208 -71.4792 -3.1
I need to preserve the 4 sig digs in columns 1 and 2, but round column 3 to 1 sig dig.
I am close with this code but it wraps the 3rd column to the next line.
awk '{print $1, $2; printf("%.1f\n"), $3 = ($3 * 9)/5}'
RudiC
February 4, 2015, 10:45am
11
Take advantage of awk
's string/number dualism, and use the %s
format specifier:
awk '{printf "%s %s %.1f\n", $1, $2, ($3 * 9)/5}' file
46.0625 -99.1042 8.8
35.4792 -79.6042 -1.1
42.5208 -71.4792 -3.1
1 Like
I now have the need to match the first two columns and when they match, calculate the percentage of average for the third columns. I cant seem to borrow from the same awk script and output the expected result.
percent of average = diff/avg * 100
Ill need to ignore/remove the minus/negative symbols if their are any in the output file.
Diff File
46.0625 -99.1042 8.8
35.4792 -79.6042 -2.1
42.5208 -71.4792 -3.1
Avg File
46.0625 -99.1042 4.9
35.4792 -79.6042 3.3
42.5208 -71.4792 10.0
expected output
46.0625 -99.1042 180
35.4792 -79.6042 63
42.5208 -71.4792 31
Any suggestions?
RudiC
February 12, 2015, 4:24pm
13
Please open a new thread for a new request!
Try
awk 'NR==FNR {T[$1,$2]=$3; next} $1,$2 in T {P=T[$1,$2]/$3*100; printf "%s %s %.0f\n", $1, $2, (P>=0)?P:-P}' file1 file2
46.0625 -99.1042 180
35.4792 -79.6042 64
42.5208 -71.4792 31
1 Like
This has been resolved. Thanks for the help RudiC!
[quote=scrutinizer;302933668]
Try this modification:
Any awk:
awk '{v=$NF; sub(/ +[^ ]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A/C}' file1 file2
Turns out that the above code did not work for me, at least in the way that I am using it.
I have the following script set up:
foreach numb (`seq 1 100`)
awk '{array[$1 $2]+=$3; a[$1 $2]++} END { for (i in array) {print i, array/(a)}}' /dir/of/data/${numb}02.pnt > output.txt
end
I expect the code to search through each of the 100 files and when the first 2 columns match, average the 3rd.
Input File 102:
43.1042 -78.6042 -6.07
25.2708 -81.1458 19.27
Input File 202:
43.1042 -78.6042 -4.88
25.2708 -81.1458 17.76
Output:
43.1042 -78.6042 -5.5
25.2708 -81.1458 18.52
Any suggestions are appreciated!
Hi, try this modification. This should work better:
awk '{v=$NF; sub(/[ \t]+[^ \t]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A/C}' file1 file2
---
ncwxpanther:
[..]
Turns out that the above code did not work for me, at least in the way that I am using it.
I have the following script set up:
foreach numb (`seq 1 100`)
awk '{array[$1 $2]+=$3; a[$1 $2]++} END { for (i in array) {print i, array/(a)}}' /dir/of/data/${numb}02.pnt > output.txt
end
Looking at you method, i have the following remarks:
why use csh?
like this awk will be called 100 times and the file output will get overwritten 100 times.
the result will be wrong 100 times, because the script needs to read all 100 files for it to work properly.
With bash you could try:
cd /dir/of/data
awk '.....' "${numb}"{1..100} > output.txt
or, if that gives line line length problems, try:
cd /dir/of/data
for i in "${numb}"{1..100}
do
cat "$i"
done |
awk '.....' > output.txt
1 Like
scrutinizer:
Hi, try this modification. This should work better:
awk '{v=$NF; sub(/[ \t]+[^ \t]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A/C}' file1 file2
Thanks for replying Scrutinizer, but im not getting the expected results.
For instance this is the data for 1 match:
-8.44
-7.86
-3.28
-9.16
-8.71
-4.76
-7.75
-6.83
-2.45
-6.42
-4.17
-7.73
-6.32
-9.33
-3.07
-7.76
-8.43
-6.08
-2.55
-7.19
-2.09
-2.9
-7.4
-7.18
-2.24
-6.29
-2.02
-4.44
-5.88
-2.36
-3.06
-1.64
-3
-12.19
-5.57
-8.51
-2.61
-2.32
-3.99
-5.02
-5.36
-6.93
-3.64
-4.52
-4
-4.92
-6.34
-4.88
-1.27
-4.88
-3.33
-2.31
-1.44
-0.48
-3.76
-2.84
-2.01
-7.14
-5.36
-3.62
-3.01
-6.29
-8.16
-5.08
-4.38
-4.44
-6.72
-7.19
-3.94
-4.85
-3.72
-6.18
-6.05
-6.09
-2.58
-0.59
-4.65
-9.03
-10.09
-6.33
-0.75
-5.53
-2.15
0.21
-4.23
-4.79
-4.61
-4.57
-5.4
-1.98
-1.18
-2.83
-7.05
-6.07
-6.02
-4.88
-1.35
0.5
-0.86
-1.55
This is the expected result:
-4.72
This is what your code produced
-1.6
Any ideas?
I had made an edit to my post, please have a look
This worked.
for numb in "${elem}"/{1..100}02.pnt
do
cat "$numb"
done |
awk '{v=$NF; sub(/[ \t]+[^ \t]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A/C}' > output