427 A C A/C 12
436 G C G/C 12
445 C T C/T 12
447 A G A/G 9
451 T C T/C 5
456 A G A/G 12
493 G A G/A 12
I wanted to read the first column and find all other ids which are differences less than 10.
427 A C A/C 12 436
436 G C G/C 12 427,445
445 C T C/T 12 436,447,451
447 A G A/G 9 445,451,456
451 T C T/C 5 445,447,456
456 A G A/G 12 451,447
493 G A G/A 12
The last column should be like the above. All id's which are + or - 10 bases apart from that specific id. For example for 436, the boundaries are {426 - 446} other id's which are in that range are 427 and 445 so i displayed them in 6th column..
Assuming no duplicate field 1 values, and that all will fit in memory this should work:
awk '
{ a[$1+0] = $0; }
END {
for( x in a )
{
printf( "%s", a[x] );
sc = " ";
for( i = x-10; i <= x + 10; i++ )
if( i != x && i in a )
{
printf( "%s%d", sc, i );
sc = ", ";
}
printf( "\n" );
}
}
' infile
$
$
$ cat f13
427 A C A/C 12
436 G C G/C 12
445 C T C/T 12
447 A G A/G 9
451 T C T/C 5
456 A G A/G 12
493 G A G/A 12
$
$
$
$ perl -lane '$x{$F[0]} = [ @F ];
END {
foreach $k (sort keys %x) {
foreach $i ($k-10..$k+10) {
push (@y, $i) if defined $x{$i} and $i != $k;
}
printf ("%-7s %-7s %-7s %-7s %-7s %s\n",@{$x{$k}},join(",",@y));
@y=()
}
}' f13
427 A C A/C 12 436
436 G C G/C 12 427,445
445 C T C/T 12 436,447,451
447 A G A/G 9 445,451,456
451 T C T/C 5 445,447,456
456 A G A/G 12 447,451
493 G A G/A 12
$
$
$
awk '
{ a[$1+0] = $0; }
END {
for( x in a )
{
printf( "%s", a[x] );
sc = " ";
for( i = x-10; i <= x + 10; i++ )
if( i != x && i in a )
{
printf( "%s%d", sc, i );
sc = ", ";
}
printf( "\n" );
}
}
' infile | sort