Identify duplicate values at first column in csv file

Input

1,ABCD,no 
2,system,yes 
3,ABCD,yes 
4,XYZ,no 
5,XYZ,yes
6,pc,no

Code used to find duplicate with regard to 2nd column

awk 'NR == 1 {p=$2; next} p == $2 { print "Line" NR "$2 is duplicated"} {p=$2}' FS="," ./input.csv

Now is there a wise way to de-duplicate the entire line (remove the duplicate) based on the criteria found within this one liner or wrapped around additional logic?

depending on what your desired output should be:

awk -F, '!a[$2]++{next} {print "Line " NR " " $2 " is duplicated"}' myFile
OR
awk -F, '!a[$2]++' myFile 
1 Like

Your thread title says you are trying to find duplicates in the 1st field; your code prints lines in which the 2nd field on the line has been seen before. Note that it prints duplicates; it does not remove duplicates.

And, since there are no lines in your sample input where the 1st field is duplicated on any other line, I have no idea what you are trying to do. What additional logic are you talking about? What output are you hoping to produce from this sample input?

Does this remove the line when $2 is found to have been a duplicate?

---------- Post updated at 05:19 PM ---------- Previous update was at 05:17 PM ----------

Thank you for the suggestion.
I tried to change the title but that doesn't seem to be an option once the submission is made...

---------- Post updated at 05:19 PM ---------- Previous update was at 05:19 PM ----------

Thank you for the suggestion.
I tried to change the title but that doesn't seem to be an option once the submission is made...

I'd say it's for YOU to find out...