AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2

First, thanks for the help in previous posts... couldn't have gotten where I am now without it!

So here is what I have, I use AWK to match $1 and $2 as 1 string in file1 to $1 and $2 as 1 string in file2. Now I'm wondering if I can extend this AWK command to incorporate the following:

If $1 and $2 in file1 equals $1 and $2 in file2, then return the the values of $1, $2 and $4 in file1 IF $5 in file1 is greater than $3 in file2

I could put a loop together but I think that would be very inefficient compared to "if" this AWK command could be extended.

file1

 
PRODCON1|SYSTEM.IOA|040720101327|USER1|201004071327
PRODCON2|SYSTEM.TEST|040720101327|USER1|201110010932
SA|SYSTEM.CCR|040720101327|USER1|201004071327
US|SYSTEM.PPD|040720101327|USER1|201004071327
VP|SYSTEM.IOA|040720101327|USER1|201110012346

file2

 
PRODCON1|SYSTEM.IOA|201109201128
PRODCON2|SYSTEM.TEST|201109201128
SA|SYSTEM|201109201128
US|SYSTEM|201109201128
VP|SYSTEM.IOA|201109201128

First I match $1 and $2 in both files with this:

awk -F\| 'NR == FNR {i[$1, $2]; next}(($1, $2) in i )' file2 file1

The result contains only the lines in file1 where $1 and $2 are the same in both files:

 
PRODCON1|SYSTEM.IOA|040720101327|USER1|201004071327
PRODCON2|SYSTEM.TEST|040720101327|USER1|201110010932
VP|SYSTEM.IOA|040720101327|USER1|201110012346

Now, from these results, I need to determine if $5 in the output above (which is date format YYYYMMDDHHmm) is greater than $3 in file2 and display $1, $2, and $4 from file1 where this conditions exists:

Required final output:

 
PRODCON2|SYSTEM.TEST|USER1
VP|SYSTEM.IOA|USER1  

Hi right_coaster,

Try:

$ cat file1
PRODCON1|SYSTEM.IOA|040720101327|USER1|201004071327
PRODCON2|SYSTEM.TEST|040720101327|USER1|201110010932
SA|SYSTEM.CCR|040720101327|USER1|201004071327
US|SYSTEM.PPD|040720101327|USER1|201004071327
VP|SYSTEM.IOA|040720101327|USER1|201110012346
$ cat file2
PRODCON1|SYSTEM.IOA|201109201128
PRODCON2|SYSTEM.TEST|201109201128
SA|SYSTEM|201109201128
US|SYSTEM|201109201128
VP|SYSTEM.IOA|201109201128
$ awk -F\| 'NR == FNR {i[$1, $2] = $3; next} { if ( i[$1, $2] && i[$1, $2] < $5 ) { printf( "%s|%s|%s\n", $1, $2, $4 ) } }' file2 file1
PRODCON2|SYSTEM.TEST|USER1
VP|SYSTEM.IOA|USER1

Regards,
Birei

1 Like

Awesome, works nicely... and I'll run it on my 12,000 line files and see how it goes. Thanks again.

a slight mod:

awk -F\| 'NR == FNR {i[$1, $2] = $3; next} i[$1, $2] && i[$1, $2] < $5 { print $1, $2, $4 }' OFS='|' file2 file1
1 Like

Using either of these methods, is it possible to print a field from file2?

NOTE: I added printing $5 below to get the date from file1 and if at all possible I'd like to print the date ($3) from file2 at the end. I tried to reference back to it with the i variable but it keeps bombing.

awk -F\| 'NR == FNR {i[$1, $2] = $3; next} i[$1, $2] && i[$1, $2] < $5 { print $1, $2, $4, $5 }' OFS='|' file2 file1
PRODCON2|SYSTEM.TEST|USER1|201110010932
VP|SYSTEM.IOA|USER1|201110012346

Desired: (data is from file1 except last entry is $3 from file2)

PRODCON2|SYSTEM.TEST|USER1|201110010932|201109201128
VP|SYSTEM.IOA|USER1|201110012346|201109201128

---------- Post updated at 06:07 PM ---------- Previous update was at 05:34 PM ----------

Aha! / Duh! moment... got it, thanks again for the helpful posts.

 
awk -F\| 'NR == FNR {i[$1, $2] = $3; next} i[$1, $2] && i[$1, $2] < $5 { print $1, $2, $4, $5, i[$1, $2] }' OFS='|' file2 file1
1 Like