I want to find duplicates in Col 2 and the get their line number.
I also want a solution to remove them using those line numbers.
The reason for choosing the line number is to make sure that I want to remove the line I chose from the duplicates, taking account of the variable in Col1.
Awk or sed egrep preferred.
Some one who knows AWK will provide a much better solution, but I can at least provide a solution.
# # get the list of duplicates in column 2
awk '{print $2}' file | sort | uniq -c | sort -n | awk '$1>1 {print $2}' > list_dups
# # for each duplicate in column 2 grep the entries from the file with line numbers
for x in $(< list_dups); do grep -n $x file;done
# # output
6:mdukphspbc CQZRIOWEUB
11:pcybtapfee CQZRIOWEUB
# # now remove the duplicate on line 6
sed '6d' file > file2
# # output after removing line 6
cat file2
vrsonlviee RVEBAALSKE
lyolzteglx UUOSIWMDLR
pcybtapfee DKGFJBHBJO
ozhrucfeau YQXATYMGJD
cjwvjolrcv YDHALRYQTG
nbiqomzsgw DYSUBQSSPZ
xovgvkneav HJFQQYBLAF
boyyzdmzka BVTVUDHSCR
vrsonlviee TGTKUCUYMA
pcybtapfee CQZRIOWEUB