Merge two files based on matching criteria

Hi,
I am trying to merge two csv files based on matching criteria:
File description is as below :
Key_File :

000|��|Key_HF|��|Key_FName		
001|��|Key_11|��|Sort_Key22|��|Key_31
002|��|Key_12|��|Sort_Key23|��|Key_32
003|��|Key_13|��|Sort_Key24|��|Key_33
050|��|Key_15|��|Sort_Key25|��|Key_34
085|��|Key_15|��|Sort_Key26|��|Key_35|��|Key_41|��|Key_42
090|��|Key_19|��|Sort_Key27
095|��|Key_19|��|Sort_Key28
100|��|Key_11|��|Sort_Key29
300|��|Key_10|��|Sort_Key30
400|��|Key_14|��|Sort_Key31
500|��|Key_15|��|Sort_Key32
999|��|Key_SS|��|

Value_File:

000|��|Val_HF|��|Value_FName		
001|��|Val_11|��|Sort_Val22|��|Val_31
002|��|Val_12|��|Sort_Val23|��|Val_32
003|��|Val_13|��|Sort_Val24|��|Val_33
050|��|Val_15|��|Sort_Val25|��|Val_34
085|��|Val_15|��|Sort_Val26|��|Val_35|��||��|null
090|��|Val_19|��|Sort_Val27
095|��|Val_19|��|Sort_Val28
100|��|Val_11|��|Sort_Val29
300|��|Val_10|��|Sort_Val30
400|��|Val_14|��|Sort_Val31
500|��|Val_15|��|Sort_Val32
999|��|Val_SS|��|

Here |��| is the delimiter.
My output requirement is :

000|��|Key_HF=Val_HF|��|Key_FName=Value_FName		
001|��|Key_11=Val_11|��|Sort_Key22=Sort_Val22|��|Key_31=Val_31
002|��|Key_12=Val_12|��|Sort_Key23=Sort_Val23|��|Key_32=Val_32
003|��|Key_13=Val_13|��|Sort_Key24=Sort_Val24|��|Key_33=Val_33
050|��|Key_15=Val_15|��|Sort_Key25=Sort_Val25|��|Key_34=Val_34
085|��|Key_15=Val_15|��|Sort_Key26=Sort_Val26|��|Key_35=Val_35|��|Key_41=|��|Key_42=null
090|��|Key_19=Val_19|��|Sort_Key27=Sort_Val27
095|��|Key_19=Val_19|��|Sort_Key28=Sort_Val28
100|��|Key_11=Val_11|��|Sort_Key29=Sort_Val29
300|��|Key_10=Val_10|��|Sort_Key30=Sort_Val30
400|��|Key_14=Val_14|��|Sort_Key31=Sort_Val31
500|��|Key_15=Val_15|��|Sort_Key32=Sort_Val32
999|��|Key_SS=Val_SS|��|

ie. Compare two files if first column value(001) of Key_file matches with first column value of the Value_file then display the output as key value pair separated by = sign (eg Key_HF=Val_HF)

I tried using below code but not getting the expected output =>

awk 'NR==FNR{ A[$1]=$2; next } { print $FNR ":" $2 } 1' FS="[|]��[|]" Key_File Value_File

Any idea how it will work ??

You might try something like:

awk '
BEGIN {	FS = "[|]��[|]"
	OFS = "|��|"
}
NR == FNR {
	for(i = 2; i <= NF; i++)
		v[$1, i] = $i
	next
}
{	printf("%s%s", $1, (NF > 1) ? OFS : ORS)
	for(i = 2; i <= NF; i++)
		printf("%s%s", $i "=" v[$1, i], (NF > i) ? OFS : ORS)
}' Value_File Key_File

which with your sample data produces the output:

000|��|Key_HF=Val_HF|��|Key_FName		=Value_FName		
001|��|Key_11=Val_11|��|Sort_Key22=Sort_Val22|��|Key_31=Val_31
002|��|Key_12=Val_12|��|Sort_Key23=Sort_Val23|��|Key_32=Val_32
003|��|Key_13=Val_13|��|Sort_Key24=Sort_Val24|��|Key_33=Val_33
050|��|Key_15=Val_15|��|Sort_Key25=Sort_Val25|��|Key_34=Val_34
085|��|Key_15=Val_15|��|Sort_Key26=Sort_Val26|��|Key_35=Val_35|��|Key_41=|��|Key_42=null
090|��|Key_19=Val_19|��|Sort_Key27=Sort_Val27
095|��|Key_19=Val_19|��|Sort_Key28=Sort_Val28
100|��|Key_11=Val_11|��|Sort_Key29=Sort_Val29
300|��|Key_10=Val_10|��|Sort_Key30=Sort_Val30
400|��|Key_14=Val_14|��|Sort_Key31=Sort_Val31
500|��|Key_15=Val_15|��|Sort_Key32=Sort_Val32
999|��|Key_SS=Val_SS|��|=

which matches what you said you wanted except for the 1st and last lines of the output. If you don' want extraneous <space> characters in the first line of your output, don't include them in both of your input files. And, if you don't want an empty field in the last line in your output, don't include an empty field in the last line of both of your input files.

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

1 Like

How we can handle below scenario in this case(i.e. single key multiple value)
Key_File

000|��|Key_HF|��|Key_FName        
001|��|Key_11|��|Sort_Key22|��|Key_31
085|��|Key_15|��|Sort_Key26|��|Key_35|��|Key_41|��|Key_42
100|��|Key_11|��|Sort_Key29|��|Sort_Key30|��|Sort_Key31|��|Sort_Key32
300|��|Key_10|��|Sort_Key30
400|��|Key_14|��|Sort_Key31
500|��|Key_15|��|Sort_Key32
999|��|Key_SS

Value_File

000|��|Val_HF|��|Value_FName        
001|��|Val_11|��|Sort_Val22|��|Val_31
085|��|Val_15|��|Sort_Val26|��|Val_35|��||��|null
100|��|Val_11|��|Sort_Val29|��|Sort_Val30
100|��|Val_11|��|Sort_Val29|��|Sort_Val30|��|Sort_Val31
100|��|Val_11|��|Sort_Val29|��|Sort_Val30|��|Sort_Val31|��|Sort_Val32
300|��|Val_10|��|Sort_Val30
400|��|Val_14|��|Sort_Val31
500|��|Val_15|��|Sort_Val32
999|��|Val_SS

Expected Output :

000|��|Key_HF=Val_HF|��|Key_FName=Value_FName        
001|��|Key_11=Val_11|��|Sort_Key22=Sort_Val22|��|Key_31=Val_31
085|��|Key_15=Val_15|��|Sort_Key26=Sort_Val26|��|Key_35=Val_35|��|Key_41=|��|Key_42=null
100|��|Key_11=Val_11|��|Sort_Key29=Sort_Val29|��|Sort_Key30=Sort_Val30|��|Sort_Key31=|��|Sort_Key32=
100|��|Key_11=Val_11|��|Sort_Key29=Sort_Val29|��|Sort_Key30=Sort_Val30|��|Sort_Key31=Sort_Val31|��|Sort_Key32=
100|��|Key_11=Val_11|��|Sort_Key29=Sort_Val29|��|Sort_Key30=Sort_Val30|��|Sort_Key31=Sort_Val31|��|Sort_Key32=Sort_Val32
300|��|Key_10=Val_10|��|Sort_Key30=Sort_Val30
400|��|Key_14=Val_14|��|Sort_Key31=Sort_Val31
500|��|Key_15=Val_15|��|Sort_Key32=Sort_Val32
999|��|Key_SS=Val_SS

Here the requirement is for one key row(100) in Key_File we have multiple values(three 100 rows) in Value_File.
Here Key_File is fixed and Value_File is changing(Can be multiple 100 row values) ..

So can we handle this scenario in this case ?

Try this adaption of Don Cragun's recent proposal:

awk '
BEGIN           {FS = "[|]��[|]"
                 OFS = "|��|"
                }
NR == FNR       {NFK[$1] = NF
                 for(i = 2; i <= NF; i++) v[$1, i] = $i
                 next
                }
                {printf("%s%s", $1, (NF > 1) ? OFS : ORS)
                 for(i = 2; i <= NFK[$1]; i++) printf("%s%s", v[$1, i] "=" $i, (NFK[$1] > i) ? OFS : ORS)
                }
' key_file value_file
000|��|Key_HF=Val_HF|��|Key_FName        =Value_FName        
001|��|Key_11=Val_11|��|Sort_Key22=Sort_Val22|��|Key_31=Val_31
085|��|Key_15=Val_15|��|Sort_Key26=Sort_Val26|��|Key_35=Val_35|��|Key_41=|��|Key_42=null
100|��|Key_11=Val_11|��|Sort_Key29=Sort_Val29|��|Sort_Key30=Sort_Val30|��|Sort_Key31=|��|Sort_Key32=
100|��|Key_11=Val_11|��|Sort_Key29=Sort_Val29|��|Sort_Key30=Sort_Val30|��|Sort_Key31=Sort_Val31|��|Sort_Key32=
100|��|Key_11=Val_11|��|Sort_Key29=Sort_Val29|��|Sort_Key30=Sort_Val30|��|Sort_Key31=Sort_Val31|��|Sort_Key32=Sort_Val32
300|��|Key_10=Val_10|��|Sort_Key30=Sort_Val30
400|��|Key_14=Val_14|��|Sort_Key31=Sort_Val31
500|��|Key_15=Val_15|��|Sort_Key32=Sort_Val32
999|��|Key_SS=Val_SS
1 Like