I have a folder with several files of which I want to eliminate all of the terms that they have in common using `awk`.
Here is the script that I have been using:
---------- Post updated at 17:14 ---------- Previous update was at 17:13 ----------
I'm afraid it will blow up your input file list...
---------- Post updated at 18:33 ---------- Previous update was at 17:14 ----------
OK, I've gotten it now. Appends every file name exactly once to the file list, so you work on the file list again when the total No. of duplicate words is found.
Is the given algorithm correct?
If only the unique words per file should be printed, shouldn't it be
awk '
FNR==1 {
# close the previous file
if (NR!=1) close(fname)
fname=FILENAME
}
# main code
{ total[$2]++; perfile[fname,$2]++ }
END {
for (fw in perfile) {
split (fw,idx,SUBSEP)
f=idx[1]; w=idx[2]
if (perfile[fw]==total[w]) print f,w
}
}
' *
The solution to the problem is the first block; in the next block simply replace all FILENAME by fname .