Join two files using awk

Hello All;

I have two files:

File1:

abc
def
pqr

File2:

abc,123
mno,456
def,989
pqr,787
ghj,678

Now, If the pattern in File1 matches the pattern in file2, then I need to add it at the end of the line.

Expected Output:

abc,123,abc
mno,456
def,989,def
pqr,787,pqr
ghj,678

I am trying to use awk - but am able to print only the lines which match the pattern. The lines which donot match the pattern are getting removed. Please help.

Hello mystition,

Could you please try following and let me know if this helps.

awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print $0 FS A[$1]} !($1 in A){print $0}' file1 FS=, file2

Output will be as follows.

abc,123,abc
mno,456
def,989,def
pqr,787,pqr
ghj,678
 

Thanks,
R. Singh

1 Like

I just moved the field separator and it worked. Many thanks dear!

awk -F"," 'FNR==NR{A[$1]=$1;next} ($1 in A){print $0 FS A[$1]} !($1 in A){print $0}' file1 file2

Somewhat simpler:

awk 'NR==FNR {T[$1]=FS $1; next} {print $0 T[$1]}' FS="," file1 file2
abc,123,abc
mno,456
def,989,def
pqr,787,pqr
ghj,678

Thanks RudiC;

Can you please explain the code:

I was able to understand the first part - where we are setting the fields in file 1 in array with a FS before them .

awk 'NR==FNR {T[$1]=FS $1; next}

However, in the second part of the Code, how are we checking the condition that Field1 of File1 should be equal to this field1 of File2?:

{print $0 T[$1]}

:confused::confused::rolleyes:

Hello mystition,

Following may help you in same, let me know if you have any queries.

awk 'NR==FNR          ####### This condition will be TRUE when first file named file is being read, because FNR attribute will be RESET when new file will be read and NR will be keep on increasing it's count till last file read successfully.
{T[$1]=FS $1; next}   ####### Creating an array named T whose index is $1 and value is FS(Field separator  which is , here mentioned later in code), after doing this next means leave all coming statements, NOT to execute.
{print $0 T[$1]}'     ####### This will be executed when 2nd file is being read because when NR==FNR will NOT be TRUE then control will come here and execute it, to print the complete line by $0 and print array T's value whose index is $1.
FS="," file1 file2    ####### mentioning Field separator here which is comma and mentioning Input_file which are file1 and file2. 
 

EDIT: For your question setting FS before the value of array named T because while printing them(which will happen while file2 is being read) we are not giving the field separator there and as per your requirement we need it so that is why we are doing this, also one benefit of this approach is if $1 of file2 is not present in file1 so it will NOT simply print field separator, which is not your requirement.

Thanks,
R. Singh

1 Like

awk -F"," 'BEGIN{OFS=","}
{
if(NR==FNR)
[$1]=$0
else
print $0,
[$1]
}' file2 file1