First post, been browsing for 3 days and came out with nothing so far.
M3 C2 V5 D5 HH:FF A1-A2,A5-A6,A1-A2,A1-4 B4-B6,B2-B4,B4-B6,B1-B2
output should be
M3 C2 V5 D5 HH:FF A1-A2,A5-A6,A1-A4 B2-B4,B4-B6,B1-B2
On col 6 and 7 there are strings in form of Ax-Ax and Bx-Bx respectively. Each string are separated by a comma ",".
How can i remove strings that are duplicates across col 6 and col 7.
For e.g if A1-A2,A1-A2 are present on col 6, i want to keep only one.
awk '{ while(++i<=NF) printf (!a[$i]++) ? $i FS : ","; i=split("",a); print ""}' data
Saw a question like mine on SO , but im stuck.
What am i doing wrong ?
awk '
BEGIN { FS="\t" } ;
{
split($6, valueArray,",");
j=0;
for (i in valueArray)
{
if (!( valueArray in duplicateArray))
{
duplicateArray[j] = valueArray;
j++;
}
};
printf $1 "\t";
for (j in duplicateArray)
{
if (duplicateArray[j]) {
printf duplicateArray[j] ",";
}
}
printf "\t";
print $8
}'
After many failed attempts i came out with the solution of breaking the delimiters to remove duplicate fields across each row, only to realize that I need to regroup all As under col 6 and Bs under col 7. Back to square 1 !
So , the solution for me would be to remove duplicates separated by delimiter in a column. Tried a Perl approach but in vain.
Thank you for your help
*Update
Please note that in this given example, Col 6 and Col 7 are not sorted