Concatenate values in the first column based on the second column.

I have a file (myfile.txt) with contents like this:

1.txt apple is
3.txt apple is
5.txt apple is
2.txt apple is a
7.txt apple is a
8.txt apple is a fruit
4.txt orange not a fruit
6.txt zero is

The above file is already sorted using this command:

sort -k2 myfile.txt

My objective is to get this:

1.txt_3.txt_5.txt apple is
2.txt_7.txt apple is a
8.txt apple is a fruit
4.txt orange not a fruit
6.txt zero is

You can notice that if the text in the second column is same as we go downwards, we concatenate the values from the first column until they remain the same.

This is what I have tried, but not working perfectly well:

awk -F' ' 'NF>2{a[$2] = a[$2]"_"$1}END{for(i in a){print a" "i}}' myfile.txt

The output that I get using the above command is this:

_4.txt orange
_1.txt_3.txt_5.txt_2.txt_7.txt_8.txt apple
_6.txt zero

Any help? I am using Linux with BASH.

Hello shoaibjameel123,

Could you please try following and let me know if this helps you.

awk 'FNR==NR{A=$1;$1="";array[$0]=array[$0]?array[$0] "_" A:A;next} {$1="";B=$0} (B in array){C=B;sub(/^[[:space:]]+/,X,B);print array[C] OFS B;delete array[C]}'   Input_file  Input_file

Output will be as follows.

1.txt_3.txt_5.txt apple is
2.txt_7.txt apple is a
8.txt apple is a fruit
4.txt orange not a fruit
6.txt zero is
 

Also if you are not worried about the sequence then following may help you in same too.

awk '{A=$1;$1="";array[$0]=array[$0]?array[$0] "_" A:A} END{for(i in array){j=i;sub(/^[[:space:]]+/,X,i);print array[j] OFS i}}'  Input_file

Output will be as follows.

8.txt apple is a fruit
2.txt_7.txt apple is a
6.txt zero is
1.txt_3.txt_5.txt apple is
4.txt orange not a fruit

Thanks,
R. Singh

1 Like

Thanks. Yes, both of them work. It solves my problem.

How about

awk '
        {sub (" ", FS)
         $0=$0
         T[$2]=(T[$2]?T[$2]"_":"") $1
        }
END     {for (t in T) print T[t], t
        }
' FS="\001" file
8.txt apple is a fruit
1.txt_3.txt_5.txt apple is
2.txt_7.txt apple is a
4.txt orange not a fruit
6.txt zero is
1 Like