$0 in a {a[$0]++; next} # This counts the number of each line in an array
if(a==nfiles) { # If the value of an element == 4
print i > "output1" # ++ the line exists in 4 files
}
I've taken the liberty to add some explaination to Franklin52's code:
awk -v nfiles="4" '
NR==FNR{a[$0]++;next} # NR is the current record number counting from start of programme
# FNR is the current record number counting from start of current file
# when FNR == NR it implies the record comes from file 1
# thus this statement captures all records from file 1 in the hash named 'a'
# the index is the whole input record with the value being a count of files.
# next causes the next record to be read and the programme to loop to top
$0 in a {a[$0]++; next} # when this statement is reached, the record ($0) is not in the first file
# if the current record was seen in the first file increment the counter
# maintained in a. Next causes the next record to be read.
{b[$0]++} # this statement is executed when a record from file2...n is encountered
# and the record was not seen in file1. A second hash is used to
# track all records that weren't in file 1
END{ # this section of code is driven after the last record is read from file n
for(i in a){ # for every record seen in file 1...
if(a==nfiles) { # if the record was seen in all files (count in a matches number of files)
print i > "output1" # print to the list for seen in all files
}
else if(a==1) { # if the record was only seen in the first file print to output list
print i > "output3" # of records just in file 1
}
}
for(i in b){ # for every record that wasn't seen in the first file, but was seen
if(b==nfiles-1) { # in all other files, print to that list
print i > "output2"
}
}
}' file1 filea fileb filec
The only potential problem with this code is that it will yield a false positive in the case where filex has a duplicate line that is in file1 and is missing from exactly one other input file. Related combinations of duplicates and 'holes' will also fall into this. If this is a concern, an easy solution would be to 'sort -u' each of the input files to remove all duplicate records.
Yes I have all the files in directory.
After manual checking I found out the script throwing error if the file number is more than 9
It start giving error 10 or more files.