awk help: Match data fields from 2 files & output results from both into 1 file

ambroze · December 10, 2012, 6:06pm

I need to take 2 input files and create 1 output based on matches from each file. I am looking to match field #1 in both files (Userid) and create an output file that will be a combination of fields from
both file1 and file2 if there are any differences in the fields 2,3,4,5,or 6.

Below is an example of where in file2 the First Name=John, Email=john.doe@yahoo.com, and Phone Number=111-222-3333.
Since these fields are different than the values in file1 I need the output in file3 based on the example below using awk. Any help would be greatly appreciated.

file1:

jdoe|Doe|Johnny|111-222-9999|jdoe@gmail.com|Main Office|1020-771-AHSDEV100|512
1|Userid
2|Last Name
3|First Name
4|Phone Number
5|Email
6|Office
7|Description
8|Account

file2:

jdoe|Doe|John|111-222-3333|john.doe@yahoo.com|Main Office|0xF48F9F97AB1E9242A6CCB96BA1DB5C79|0x820A4CE019D51F45B7B911CDCDB5208D|Data1|Data2|Data3|Data4|Data5|Data6|0
1|Userid
2|Last Name
3|First Name
4|Phone Number
5|Email
6|Office
7|Office Location
8|Contact UUID
9|Data1
10|Data2
11|Data3
12|Data4
13|Data5
14|Data6
15|Active

file3:

{ "820A4CE019D51F45B7B911CDCDB5208D", "Doe", "Johnny", "111-222-9999", "jdoe@gmail.com", "jdoe", "F48F9F97AB1E9242A6CCB96BA1DB5C79", "Data1", "0" }

{ 
"file2 field #8 (Office Location) cut -c3-34"
"file1 field #2 (Last Name)"
"file1 field #3 (First Name)"
"file1 field #4 (Phone Number)"
"file1 field #5 (Email)"
"file1 field #1 (Userid)"
"file2 field #7 (Office Location) cut -c3-34"
"file2 field #9 (Data1)
"file2 field #15 (Active)"

rdrtx1 · December 10, 2012, 7:13pm

try:

awk -F"|" '
NR==FNR {a[$1]=$0; for (i=2; i<=6; i++) b[$1,$i]=b[$1,i]=$i; next}
a[$1] {
  s=0;
  for (i=2; i<=6; i++) {if (!b[$1,$i]) s=1; continue;};
  if (s==1) {
    printf "{ \"";
    printf substr($8,3) "\", \"";
    printf b[$1,2]      "\", \"";
    printf b[$1,3]      "\", \"";
    printf b[$1,4]      "\", \"";
    printf b[$1,5]      "\", \"";
    printf $1           "\", \"";
    printf substr($7,3) "\", \"";
    printf $9           "\", \"";
    printf $15;
    print  "\" }";
  }
}
' file1 file2 > file3

Using only line 1 in example for file1 and file2.

ambroze · December 11, 2012, 10:09am

rdrtx1:

try:

awk -F"|" '
NR==FNR {a[$1]=$0; for (i=2; i<=6; i++) b[$1,$i]=b[$1,i]=$i; next}
a[$1] {
  s=0;
  for (i=2; i<=6; i++) {if (!b[$1,$i]) s=1; continue;};
  if (s==1) {
   printf "{ \"";
   printf substr($8,3) "\", \"";
   printf b[$1,2]      "\", \"";
   printf b[$1,3]      "\", \"";
   printf b[$1,4]      "\", \"";
   printf b[$1,5]      "\", \"";
   printf $1           "\", \"";
   printf substr($7,3) "\", \"";
   printf $9           "\", \"";
   printf $15;
   print  "\" }";
  }
}
' file1 file2 > file3

Using only line 1 in example for file1 and file2.

rdrtx1 - This works perfect and is exactly what I was looking for. I did find somethings that I would like to see if they could be added and it's my fault for thinking of them after the fact.

Is there a way to only output into file3 the results where there were differences? If fields 2,3,4,5,or 6 are the same they do not need to be outputted into file3.

rdrtx1 · December 11, 2012, 10:44am

try fixed:

awk -F"|" 'NR==FNR {a[$1]=$0; for (i=2; i<=6; i++) b[$1,$i]=b[$1,i]=$i; next}
a[$1] {  
  s=0; 
  for (i=2; i<=6; i++) {if (!b[$1,$i]) s=1};  
  if (s==1) {    
    printf "{ \"";    
    printf substr($8,3) "\", \"";    
    printf b[$1,2]      "\", \"";    
    printf b[$1,3]      "\", \"";    
    printf b[$1,4]      "\", \"";    
    printf b[$1,5]      "\", \"";    
    printf $1           "\", \"";    
    printf substr($7,3) "\", \"";    
    printf $9           "\", \"";    
    printf $15;    
    print  "\" }";  
  }
}' file1 file2 > file3

ambroze · December 11, 2012, 11:00am

rdrtx1:

try fixed:

awk -F"|" 'NR==FNR {a[$1]=$0; for (i=2; i<=6; i++) b[$1,$i]=b[$1,i]=$i; next}
a[$1] {  
  s=0; 
  for (i=2; i<=6; i++) {if (!b[$1,$i]) s=1};  
  if (s==1) {    
   printf "{ \"";    
   printf substr($8,3) "\", \"";    
   printf b[$1,2]      "\", \"";    
   printf b[$1,3]      "\", \"";    
   printf b[$1,4]      "\", \"";    
   printf b[$1,5]      "\", \"";    
   printf $1           "\", \"";    
   printf substr($7,3) "\", \"";    
   printf $9           "\", \"";    
   printf $15;    
   print  "\" }";  
  }
}' file1 file2 > file3

WORKS PERFECT! I really appreciate your help. You and awk are amazing! I've been scripting with shell for 15+ years but never got to the advanced level of using awk or sed, I really need to invest more time into learning it. Do you do training classes? Thanks again.

Corona688 · December 11, 2012, 11:33am

There's really no substitute for just using it until you're used to it.