Compare colunmn and find value within a ranges

giuliangiuseppe · January 13, 2016, 7:33am

Dear All,
sorry for open a new thread but the old one (http://www.unix.com/shell-programming-and-scripting/263430-find-values-within-range-output.html\) is already marked as resolved but actually it doesn't work properly and the input file are a bit different.

File 1:

1 195240910 +
2 195240915 -

File2:

1 195240905 4
1 195240906 4
1 195240907 5
1 195240908 5
1 195240909 3
1 195240910 0
1 195240911 5
1 195240912 5
1 195240913 0
1 195240914 0
1 195240915 3
1 195240916 4
1 195240917 5
1 195240918 8
1 195240919 5
1 195240920 6
2 195240905 7
2 195240906 2
2 195240907 9
2 195240908 9
2 195240909 2
2 195240910 12
2 195240911 2
2 195240912 9
2 195240913 5
2 195240914 9
2 195240915 0
2 195240916 2
2 195240917 9
2 195240918 5
2 195240919 9
2 195240920 6

Well, I would like to compare these two files in this way.
first, column $1 and $2 of both files must match, if so, output the matching values and if column $3 of file 1 is

output the less n value of $3 in File2, otherwise if column $3 of file 1 is

output the more n value of $3 in File2.

so for File1 and File2 output should be (for n=5):

1 195240910 4 5 5 3 0
2 195240915 9 5 9 2 0

Well, I really hope that color could help to understand.
Any help or suggestion?

Best

RudiC · January 13, 2016, 8:23am

Try

awk '
FNR == NR       {T[$1] = $2
                 S[$1] = sprintf ("%d", $3 N-1)
                 next
                }

NR <= L         {printf "%s%s", $3, L==NR?"\n":" "
                 next
                }

(T[$1] <= $2 + S[$1] || T[$1] == $2 ) &&
T[$1]           {printf "%s %s %s ", $1, T[$1], $3
                 delete T[$1]
                 L = NR + N - 1
                }
' N=5 file1 file2
1 195240910 4 5 5 3 0
2 195240915 0 2 9 5 9

giuliangiuseppe · January 13, 2016, 9:20am

Dear RudiC,
your script show only the last entry.
In fact if File1 is:

1 195240910 +
1 195240920 +
2 195240915 -

The output is:

1 195240920 4 5 8 5 6
2 195240915 0 2 9 5 9

instead of:

1 195240910 4 5 5 3 0
1 195240920 4 5 8 5 6
2 195240915 0 2 9 5 9

file2 is always:

1 195240905 4
1 195240906 4
1 195240907 5
1 195240908 5
1 195240909 3
1 195240910 0
1 195240911 5
1 195240912 5
1 195240913 0
1 195240914 0
1 195240915 3
1 195240916 4
1 195240917 5
1 195240918 8
1 195240919 5
1 195240920 6
2 195240905 7
2 195240906 2
2 195240907 9
2 195240908 9
2 195240909 2
2 195240910 12
2 195240911 2
2 195240912 9
2 195240913 5
2 195240914 9
2 195240915 0
2 195240916 2
2 195240917 9
2 195240918 5
2 195240919 9
2 195240920 6

Best

RudiC · January 13, 2016, 9:25am

The array values for $1 are being overwritten, so the last entry only is valid. You didn't mention there's several $1 values possible.

giuliangiuseppe · January 13, 2016, 9:28am

Yes sorry I didn't, I didn't think about it...

MadeInGermany · January 13, 2016, 9:42am

awk '
{ k=$1 FS $2 }
NR==FNR { a[k]=$3; next }
{ s[NR%n]=$3 }
(k in a) {
  printf "%s %s",$1,$2
  if (a[k]=="+") {
    for (i=1; i<=n; i++) printf " %s",s[(NR+i)%n]
    printf "\n"
  } else { f=n }
}
(f && (f--==1)) {
  for (i=n; i>=1; i--) printf " %s",s[(NR+i)%n]
  printf "\n"
}
' n=5 file1 file2

giuliangiuseppe · January 13, 2016, 11:20am

works good thanks!