Paste two file side by side together based on specific pattern match problem

patrick87 · December 17, 2009, 3:04am

Input file_1:

P78811
P40108
O17861
Q6NTW1

P40986
Q6PBK1

P38264

Q6PBK1

Q9CZ49

Q1GZI0

Input file_2:

P78811   From UK
Q6PBK1     From Australia    
O17861   From British

Desired output file:

P78811   P78811  From UK
P40108
O17861   O17861  From British
Q6NTW1

P40986
Q6PBK1     Q6PBK1     From Australia    

P38264

Q6PBK1     Q6PBK1     From Australia    

Q9CZ49

Q1GZI0

File_1 consider the back-bone of my desire output file. The data at File_1 must be presented at the desired output file no matter is empty line or data that don't match with file 2.
Besides that, the output data sequence must follow exactly like file_1. Some data at file 1, might apprear more than once as well. My purpose is to paste those file_2 data that got match with file_1 data to the output file.
Thanks for any suggestion.

ichigo · December 17, 2009, 3:16am

gawk 'FNR==NR{
    o=$1;$1=""
    a[o]=$0
    next
}
($1 in a) { print $1,a[$1] }
(!($1 in a)) {print} ' file2 file1

Scott · December 17, 2009, 4:05am

That doesn't seem to give the desired output.

awk '
 NR == FNR { A[$1] = $1 "\t" $0; next }
 { print ($1 in A)?A[$1]:$0 }
' file2 file1

P78811  P78811   From UK
P40108
O17861  O17861   From British
Q6NTW1

P40986
Q6PBK1  Q6PBK1     From Australia

P38264

Q6PBK1  Q6PBK1     From Australia

Q9CZ49

Q1GZI0

summer_cherry · December 17, 2009, 4:25am

nawk 'NR==FNR{_[$1]=$0}
        NR!=FNR{print $1" "_[$1]}' b a

patrick87 · December 18, 2009, 5:24am

Hi scottn,
Thanks for your reminding
Can I ask you what is the meaning of "?" and "A[$1]:$0" in your awk code?

awk '
 NR == FNR { A[$1] = $1 "\t" $0; next }
 { print ($1 in A)?A[$1]:$0 }
' file2 file1

Thanks.

---------- Post updated at 05:24 AM ---------- Previous update was at 05:17 AM ----------

Hi summer,
Can I ask you how we can determine to use awk, nawk, or gawk?
As I know, Awk is the orignal awk. Nawk is new_awk and gawk the gnu_awk. The gnu_awk can do most, but is not available everywhere. So best is to use only things which nawk can do, because if that is not installed, then the system is not well anyway.
The "_" at your code script, is it represent everything inside the data?
Thanks for suggestion

Scott · December 18, 2009, 6:10am

Hi Patrick.

It's really just shorthand for an if-then-else statement. Search for "The Conditional Statement" here

---------- Post updated at 05:24 AM ---------- Previous update was at 05:17 AM ----------

I use awk because it is generally a copy of or link to nawk or gawk, so I don't need to know or think about which one is installed.

i.e. On AIX it's a link to nawk, on Linux awk and nawk are probably both linked to gawk.

The only exception is on Solaris, where you should use nawk, /usr/xpg4/bin/awk or gawk (if it's available) instead of /usr/bin/awk.

_ is just a cool name to use for a variable

patrick87 · December 18, 2009, 6:15am

Thanks a lot, scottn.
I understand what is the back story behind your code now
Thanks ^^