A few years Ago the user radoulov posted a fancy solution for a problem, which was about finding common lines (gene variation names) between multiple samples (files). The code was:
awk 'END {
for (R in rec) {
n = split(rec[R], t, "/")
if (n > 1)
dup[n] = dup[n] ? dup[n] RS sprintf("\t%-20s -->\t%s", rec[R], R) : \
sprintf("\t%-20s -->\t%s", rec[R], R)
}
for (D in dup) {
printf "records found in %d files:\n\n", D
printf "%s\n\n", dup[D]
}
}
{
rec[$0] = rec[$0] ? rec[$0] "/" FILENAME : FILENAME
}' f10.lista f12.lista f13.lista f14.lista fs6.lista
The problem now is that I want to find intersectons of lines between 3, 4 and 5 files, but the program is only showing the results for 3 files.
I'm very newbie at AWK so help me please to modify this code to get my solution.
Thank yo in advance.
Thank you DGPickett for your answer but what I need is to modify the given code to obtain the intersection results for 4 and 5 or more files than just 3.
Actually, I want this kind of result:
records found in 3 files:
.
.
.
.
records found in 4 files:
.
.
.
.
.
records found in 5 files:
.
.
.
records found in 'n' files:
awk '
! f[FILENAME]++ {fc++}
! b[$0,FILENAME] {a[$0]++; b[$0,FILENAME]=$0}
END {
for (j=3; j<=fc; j++) {
print "records found in " j " files:"
for (i in a) {if (a==j) print i}}
}
' file*