Parsing a CSV file and deleting all rows on condition

Hello list,

I am working on a csv file which contains two fields per record which contain IP addresses. What I am trying to do is find records which have identical fields(IP addresses) which occur 4(four) times, and if they do, delete all records with that specific identical field(ip address).

for example, the csv looks like this:

field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128
field 1,10.10.10.1,field 3, 192.168.1.128
field 1,172.16.10.1,field 3, 192.168.1.128
field 1,10.10.10.3,field 3, 192.168.1.128

so I want to parse the csv and delete the following rows where the ip address in field 2 is identical for 4 occurrences:

field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128

any ideas?

thanks.

awk -F , 'NR==FNR{a[$0]++;next} a[$0]<4' infile infile

field 1,10.10.10.1,field 3, 192.168.1.128
field 1,172.16.10.1,field 3, 192.168.1.128
field 1,10.10.10.3,field 3, 192.168.1.128
1 Like

hi rdcwayx,

I tried this but can't seem to get it to work. Also I've only got one infile, so I'm not comparing two different files, but rather comparing records, and if there are 4 records with the exact same field value for one of the specific fields, then deleting all four records. Something like an < infile > outfile function.

Try this, Pass input file twice

awk -F"," 'NR==FNR{a[$2]=a[$2]+1;next}a[$2]<4' infile infile >outfile
1 Like

it works!