Many to many -- mapping

busyboy · August 29, 2018, 6:41am

INPUT

13333--TEXT1
14444--TEXT2
13333--TEXT3
12233--TEXT5
14444--TEXT5
12233--TEXT1
12222--TEXT5
13333--TEXT09

what I'm looking for is something using awk arrays with below given output.

14444--TEXT2,TEXT5
13333--TEXT1,TEXT3,TEXT09
12233--TEXT5,TEXT1
12222--TEXT5

vgersh99 · August 29, 2018, 8:57am

Any attempts on your own?

busyboy · August 29, 2018, 9:17am

Here the END part is trying to do a scanning on the array of input being built in the action panel of gawk

gawk '/pmd/ { 
split($1,IP," "); 
x[IP[2]]++; 
y[n++]=IP[2]"--"$2;} END {
for(i in y){
{
{print y}
}
}
}' file

other approach was

gawk '/pmd/ {
split($1,IP," "); 
x[IP[2]]++; y[IP[2]"--"$2]++;} END {
for(i in x){
 for(j in y){
if(i ~ y){
print i}
}
}
}' file

vgersh99 · August 29, 2018, 9:39am

awk -F='--'  '
  {
    a[$1]=($1 in a)?a[$1] "," $2:$2
  }
  END {
     for(i in a)
       print i,a
  }' OFS='--' myFile

Peasant · August 29, 2018, 10:17am

Is this correct explanation of that specific line ?

Create array named 'a' indexed with first field with conditional value of : if first field exist in array a , set value to that array member, comma and second field, otherwise set the value to second field.
This will effectivley, append second field if array is defined with index of currently processing line, otherwise, create array member with value of second field.

...
    a[$1]=($1 in a)?a[$1] "," $2:$2
...

Thanks and regards
Peasant.

vgersh99 · August 29, 2018, 10:45am

it's somewhat close. Let me rewrite this short-circuit

 a[$1]=($1 in a)?a[$1] "," $2:$2

as this:

if ($1 in a)
   a[$1]=a[$1] "," $2
else
   a[$1]=$2

Hopefully the long-hand will make it easier to understand...

busyboy · August 30, 2018, 8:20am

@vgersh99

Thanks for the idea. it works for me as a standard input is concerned...

the problem is that I'm combining the result into this format in AWK's action panel( or action part) and then doing the many to many mapping ( requirement of this thread ) in the END part.

so my code goes like below:

gawk 'BEGIN {n=0;}
/pmd/ {
split($1,IP," "); 
x[IP[2]]++; 
y[n++]=IP[2]"--"$2;
} END {for(i in x){ 
for(j in y){
if(k ~ i){print i,j}
}
}
}' input-file.

the trick here is that the part before the END section is producing output like

x is the array containing 13333, 12222, 144444, 155555 -- just for refernce

in the array of

"y"

with

"n"

as index starting with 0. now when I try to scan the

13333--TEXT1
12222--TEXT2
12222--TEXT34
13333--TEXT9
14444--TEXT12
15555--TEXT23
14444--TEXT234
13333--TEXT08
13333--TEXT34

what would be the best way to scan the values of

in

from above given values format and then do a many to many mapping as you have done previously.

------ Post updated at 02:41 PM ------

below is how your code is working for me -- giving required ouput

but I need something in a single awk execution..

gawk '/pmd/ {split($1,IP," "); 
x[IP[2]]++; 
y[n++]=IP[2]"--"$2;} 
END {
for(j in y){ 
print y[j]
}
}'  inputfile |awk -F'--'  '
  {
    a[$1]=($1 in a)?a[$1] "," $2:$2
  }
  END {
     for(i in a)
       print i,a
  }' OFS='--'

------ Post updated at 06:19 PM ------

i eventuall ended up in managing the requirement-- instead of going into the END part, I simply co-joined the parts in action part.

gawk '/pmd/ {
split($1,IP," ");
 a[IP[2]]=(IP[2] in a)?a[IP[2]] "<br>" $2:$2
  }
  END {
     for(i in a)
       print i,a
  }' OFS='--'  input file

------ Post updated at 06:20 PM ------

thanks @vgersh99