combining 2 files with more than one match in second file

Hello,

I am attempting to combine two files where the second file can have more than one match with the lookup field (node) in the first file, onto one line of the output file. Also alerting if a lookup was not found in file2

=-=-=-=-=-=-=
Example of file1
node,type
=-=-=-=-=-=-=
bob,232
fred,202
jim,202
pat,232
sam,232
ted,118
tim,118

......
(up to around 15,000 lines)

=-=-=-=-=-=-=-=
Example of file2
node,serialnumber
=-=-=-=-=-=-=-=

alan,TEW9873FG
bob,XVT81283WS
fred,YHW81342BB
fred,YHW83281BA
fred,YHW98746CR
jake,QWE9896AQ
pat,LKJ55432GH
sam,QER8984UH
sam,QER6536WE
sam,QER2343ET
sid,ASW9093TW
ted,OPW4324TY
val,RWT6492OK
.....
(up to around 20,000 lines)

=-=-=-=-=-=-=-=-=
Output file required
=-=-=-=-=-=-=-=-=
bob,232,XVT81283WS
fred,202,YHW81342BB;YHW83281BA;YHW98746CR
jim,202,NO_SERIAL_FOUND
pat,232,LKJ55432GH
sam,232,QER8984UH;QER6536WE;QER2343ET
ted,118,OPW4324TY
tim,118,NO_SERIAL_FOUND

I have managed to combine lines using two read file loops in ksh but this will only give me a single line match between file1 file2. Also tried a
join -t',' which will give me results of nodes with more than one serial on a newline per serial (and sometimes not able find some of the matches at all)

Any advice how to tackle this would much appreciated,
Kind rgds,
John.

Try....

$ head file?
==> file1 <==
bob,232
fred,202
jim,202
pat,232
sam,232
ted,118
tim,118

==> file2 <==
alan,TEW9873FG
bob,XVT81283WS
fred,YHW81342BB
fred,YHW83281BA
fred,YHW98746CR
jake,QWE9896AQ
pat,LKJ55432GH
sam,QER8984UH
sam,QER6536WE
sam,QER2343ET

$ awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1]=(a[$1]?a[$1] ";":"") $2;next}$3=(a[$1]?a[$1]:"NO_SERIAL_FOUND")' file2 file1
bob,232,XVT81283WS
fred,202,YHW81342BB;YHW83281BA;YHW98746CR
jim,202,NO_SERIAL_FOUND
pat,232,LKJ55432GH
sam,232,QER8984UH;QER6536WE;QER2343ET
ted,118,OPW4324TY
tim,118,NO_SERIAL_FOUND

hi,

Up to now, it seems the most difficult one for me. And it really took me much time to find out the solution. Hope this can help you.

code:

nawk 'BEGIN{
FS=","
n=1
}

function isExist(k)
{
	flag=0
	for ( i in name)
	{	if(name==k)
			flag=i
	}
	return flag
}

{
if (NR==FNR)
{
	name[n]=$1
	con[n]=$2
	n++
}
else
{
	if(isExist($1)!=0)
		con[isExist($1)]=sprintf("%s,%s",con[isExist($1)],$2)
}
}
END{
for (i=0;i<n;i++)
if (index(con,",")!=0)
	print name","con
else
	print name","con",NO_SERIAL_FOUND"
}' file1 file2

Here goes using awk:

File1:

File2:

Script:

Output:

HTH

Ygor and summer_cherry

Wow that works like a treat!

How does it work with such a condensed amount of awk? I was tying myself in knots with all those nested read file loops in ksh.

As I'm going to need to reuse this output file and combine it with a 3rd 4th ...6th file etc

ie.
Now I have

=-=-=-=-=
Output file
=-=-=-=-=
bob,232,XVT81283WS
fred,202,YHW81342BB;YHW83281BA;YHW98746CR
jim,202,NO_SERIAL_FOUND
pat,232,LKJ55432GH
sam,232,QER8984UH;QER6536WE;QER2343ET
ted,118,OPW4324TY
tim,118,NO_SERIAL_FOUND

=-=-=-=-=-=-=-=-=
and input file
node,mac_address
=-=-=-=-=-=-=-=-=
bob,0845.5632.7e7a
bob,0845.5632.7e85
jake,9853.bb43.c823
jim,no mac address
sam,ea4c.512b.3462
ted,2782.34ac.eb21
tim,904b.0042.ad04
tim,904b.0042.ada1
tim,904b.0042.adc3

=-=-=-=-=-=-=-=-=
Combined output file
=-=-=-=-=-=-=-=-=
bob,232,XVT81283WS,0845.5632.7e7a;0845.5632.7e85
fred,202,YHW81342BB;YHW83281BA;YHW98746CR,NO_MAC_FOUND
jim,202,NO_SERIAL_FOUND,no mac address
pat,232,LKJ55432GH,NO_MAC_FOUND
sam,232,QER8984UH;QER6536WE;QER2343ET,ea4c.512b.3462
ted,118,OPW4324TY,2782.34ac.eb21
tim,118,NO_SERIAL_FOUND,904b.0042.ad04;904b.0042.ada1;904b.0042.adc3

Many thanks for all your help on this, I will continue try and decipher the wizardry of the awk in your replies.
Kind rgds,
John.

Hello
Just thinking I may well be now making this unnecessarily complicated as I could combine.

file1 with file2 to output_file1

file1 with file3 to output_file2

file1 with file4 to output_file3

then join output_file1 to output_file2 etc....

will need to understand syntax in awk though

nawk 'BEGIN{FS=OFS=","}NR==FNR{a[$1]=(a[$1]?a[$1] ";":"") $2;next}$3=(a[$1]?a[$1]:"NO_SERIAL_FOUND")'

Again many thanks to all of you who have been looking into this :slight_smile:
Kind rgds,
John.