Dear all,
I would be grateful for your help with the following.
I have the following file (file.txt), which is about 10,000 lines long:
ID1 ID2 0 1 0.5 0.6
ID3 ID4 0 0 0.4 0.8
ID1 ID5 0 1 0.5 0.3
ID6 ID2 1 0 0.4 0.8
The IDs in the first two columns can occur between 1 to 10 times in the file (in either column 1 or column 2).
What I want to achieve:
I want to scan this file line by line, and print IDs to an ever-growing exclusion list if they meet the following criteria:
If $3 > $4, print $2 (ID2) > exclusionlist.txt
If $3 < $4, print $1 (ID1) > exclusionlist.txt
If $3==$4 && $5 < $6, print $2 (ID2) > exclusionlist.txt
If $3==$4 && $5 > $6, print $1 (ID1) > exclusionlist.txt
So applying this to row 1, either ID1 or ID2 should have been added to my exclusion list.
I then want to delete all lines in the file where that ID from the exclusion list appears. This can be up to 10 rows.
Output for file.txt once row 1 has been scanned:
ID3 ID4 0 0 0.4 0.8
ID6 ID2 1 0 0.4 0.8
And exclusionlist.txt:
ID1
I then want to start again at the new row 1, and execute the same process, but keep adding my exclusion from the new row 1 to the same exclusion list.
The commands that I have at my disposal are:
awk 'NR==1{print;}' file.txt
awk '{if ($3>$4 || $3==$4 && $5<$6) print $2;}' file.txt > exclusionlist.txt
awk '{if ($3>$4 || $3==$4 && $5>$6) print $1;}' file.txt > exclusionlist.txt
grep -v -f exclusionlist.txt file.txt
But there are problems inherent in this:
The exclusionlist.txt does not 'keep growing'.
Also, how do I loop it back so that it starts again at line 1?
I would be grateful for any solutions.
Thank you,
A.B.