Parsing a CSV file and deleting all rows on condition

landossa · June 7, 2011, 9:43pm

Hello list,

I am working on a csv file which contains two fields per record which contain IP addresses. What I am trying to do is find records which have identical fields(IP addresses) which occur 4(four) times, and if they do, delete all records with that specific identical field(ip address).

for example, the csv looks like this:

field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128
field 1,10.10.10.1,field 3, 192.168.1.128
field 1,172.16.10.1,field 3, 192.168.1.128
field 1,10.10.10.3,field 3, 192.168.1.128

so I want to parse the csv and delete the following rows where the ip address in field 2 is identical for 4 occurrences:

field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128
field 1,192.168.1.1,field 3, 192.168.1.128

any ideas?

thanks.

rdcwayx · June 7, 2011, 9:56pm

awk -F , 'NR==FNR{a[$0]++;next} a[$0]<4' infile infile

field 1,10.10.10.1,field 3, 192.168.1.128
field 1,172.16.10.1,field 3, 192.168.1.128
field 1,10.10.10.3,field 3, 192.168.1.128

landossa · June 8, 2011, 12:35am

hi rdcwayx,

I tried this but can't seem to get it to work. Also I've only got one infile, so I'm not comparing two different files, but rather comparing records, and if there are 4 records with the exact same field value for one of the specific fields, then deleting all four records. Something like an < infile > outfile function.

pravin27 · June 8, 2011, 1:11am

Try this, Pass input file twice

awk -F"," 'NR==FNR{a[$2]=a[$2]+1;next}a[$2]<4' infile infile >outfile

landossa · June 8, 2011, 1:46am

it works!