Remove duplicate based on Group

Hi,

How can I remove duplicates from a file based on group on other column? for example:

Test1|Test2|Test3|Test4|Test5
Test1|Test6|Test7|Test8|Test5
Test1|Test9|Test10|Test11|Test12
Test1|Test13|Test14|Test15|Test16
Test17|Test18|Test19|Test20|Test21
Test17|Test22|Test23|Test24|Test5

First we need to look at column 1 and then remove the duplicate rows based on column 5. Column 1 has two groups Test1 and Test17 so we have to find duplicates in column 5 based on column 1. Output of this file is:

Test1|Test2|Test3|Test4|Test5
Test1|Test9|Test10|Test11|Test12
Test1|Test13|Test14|Test15|Test16
Test17|Test18|Test19|Test20|Test21
Test17|Test22|Test23|Test24|Test5

This should work:

awk -F "|" ' ! s[$1,$5]++ ' input-file >output-file
1 Like

Thanks.