arranging columns with AWK

sramirez · December 11, 2010, 2:59pm

Hi there!

Can this be done with AWK? Several text files (file1, file2, etc) with different number of lines. Need to append each file to a Reference File (ReFile), and match each line of file1, file2 etc to the closest value in ReFile. Empty cells must be filled with NA, or 0. The number of lines in ReFile is always greater than in any of the files to append.

INPUT

ReFile
1.0
4.6
15.5
34.3
57.5
65.9
70.6

file1
4.75
17.54
58.90

file2
6.45
18.54
33.90
66.78

OUTPUT

ReFile	file1	file2
1.0	NA	NA			
4.6	4.75 	6.45
15.5	17.54	18.54
34.3	NA	33.90
57.5	58.90	NA	
65.9	NA	66.78
70.6	NA	NA

Thanks!

binlib · December 12, 2010, 1:19pm

Since you didn't define what quantifies as closest, I made up the number 3.5. You may need to fine tune the function closest to get the best approximation.

awk '
function closest(x, i   , j)
{
  for (j = 1; j <= c; ++j)
    if (x < a[i,j]+d && x > a[i,j]-d)
      return a[i,j]
  return "N/A"
}

d < 1 {
  if (FNR == 1)
    f[++n] = FILENAME
  a[n, FNR] = $0
  ++c[n]
}

d > 0 {
  if (FNR == 1) {
    printf("%s", FILENAME)
    for (i = 1; i <= n; ++i)
      printf("\t%s", f)
    printf("\n")
  }

  for (i = 1; i <= n; ++i)
    $(i+1) = closest($1, i)
  print
}
' file1 file2 OFS='\t' d=3.5 ReFile

sramirez · December 12, 2010, 2:06pm

Thanks!! It works great!!!!

New to AWK, so will be studying your code and "man AWK". It is really great that I can specify the difference between values to anything. Now, with your current code, if there are several matches to the ReFile, it will print them all. How would you specify to print each value only once (let's say print it when it founds the first match)??

Thanks again!!