Greetings Experts,
Issue: Within awk script, remove the duplicate occurrences that are space (1 single space character) separated
Description: I am processing 2 files using awk and during processing, I am building an array and there are duplicates on this; how can I delete the duplicates within an awk without moving out of it; To put in a simple way, I am building an array as
awk -F "@" '
......
v_array[$1 OFS $2]=(v_array[$1 OFS $2] ? v_array[$1 OFS $2] "," $3 : $3)
.....
' file1.txt file2.txt
File1.txt
col1 col2 col3
abc def xyz
abc efg pqr
abc def qrs
stu vwx yz
abc def xyz
current contents in v_array:
v_array[abc def]=xyz,qrs,xyz
v_array[abc efg]=pqr
v_array[stu vwx]=yz
Expected contents in v_array: As you can see xyz is repeated for the combination of abc def; hence it needs to be picked only once
v_array[abc def]=xyz,qrs
v_array[abc efg]=pqr
v_array[stu vwx]=yz
Ordering is not required. It can be xyz,qrs
or qrs,xyz
I can check for the presence of $3 in the v_array using the split function as
if (v_array) {
for (i in v_array) {
v_dup_check="not present"
v_cnt=split(i,v_a_tmp,",");
for (k=1;k<=v_cnt;k++) {
if (a_tmp[k]==$3) {
v_dup_check="present"} }
if (v_dup_check=="not present") {
v_array[$1 OFS $2]=(v_array[$1 OFS $2] ? v_array[$1 OFS $2] "," $3 : $3)
}
else {
v_array[$1 OFS $2]=v_array[$1 OFS $2] }
}}
This is what I can think as of now; hope there would be a much better approach to handle this within awk;
Also, how to sort the array index and array elements after completion of array build as I am learning awk through the forums; I mean
v_array[$1 OFS $2]
-- how to process the elements in the order of $1 OFS $2
and also how to sort on the array values as
v_array[$1 OFS $2]=$3
-- how to process the elements in array in the order of $3
Thank you for your valuable time..
Edit:
Please note that for further processing, the array index should not be changed v_array[$1 OFS $2]