Trying to get the unique count of the below input
, but if the text in beginning of $5
is a partial match to another line in the file then it is not unique.
awk
awk '!seen[$5]++ {n++} END {print n}' input
7
input
chr1 159174749 159174770 chr1:159174749-159174770 ACKR1
chr1 159175223 159176240 chr1:159175223-159176240 ACKR1
chr2 149225899 149228040 chr2:149225899-149228040 AK025127;MBD5
chr2 200213413 200213906 chr2:200213413-200213906 AK025127;SATB2
chr3 196050574 196050878 chr3:196050574-196050878 AK124973;TM4SF19;TM4SF19-TCTEX1D2
chr10 5042568 5042687 chr10:5042568-5042687 AKR1C2
chr10 5043696 5043883 chr10:5043696-5043883 AKR1C2
chr10 5043695 5043883 chr10:5043695-5043883 AKR1C2;AKR1C3
desired output (correct count) 4
since $5
in line 1 and 2 are the same, $5
in line 3 and 4 are the same and $5
in line 6,7,8 are the same. I can only seem to count each line and the ;
is causing problems, but I can not seem to fix it. Thank you :).