Awk: How to merge duplicate lines and print in a single

winter9 · March 14, 2011, 1:35pm

The input file:

>cat module1
200611051053                     95
200523457498                     35
200617890187                     57
200726098123                     66
200645676712                     71
200744556590                     68

>cat module2
200645676712                     56
200617890187                     50
200523457498                     29
200726098123                     62
200744556590                     69
200611051053                     90

expecting output

Reg No                      module1       module2
200611051053                     95        90
200523457498                     35        29
200617890187                     57        50
200726098123                     66        62
200645676712                     71        56
200744556590                     68        69

Many Thanks

116 · March 14, 2011, 1:37pm

Use this to collect the required info in a hash

{ reg[$1] = reg[$1] $2 }

then in the END print all the stuff using a loop.

sk1418 · March 14, 2011, 1:41pm

awk 'FNR==NR{a[$1]=$0;} NR>FNR{print a[$1]? a[$1]" "$2 : $0}' t1.txt t2.txt
200645676712 71 56
200617890187 57 50
200523457498 35 29
200726098123 66 62
200744556590 68 69
200611051053 95 90

winter9 · March 14, 2011, 1:45pm

Thanks for the replied,

Can you explain a little bit more, I'm new for awk.

Many Thanks!

---------- Post updated at 12:45 PM ---------- Previous update was at 12:41 PM ----------

How can you read module1 and module2 into a 2 seperate parts with command?

116 · March 14, 2011, 1:46pm

I was referring something similar to this:

cat file1 file2 | awk ' { arr[$1] = arr[$1] $2 }   END {for (i in arr) print i arr }'

sk1418 · March 14, 2011, 1:47pm

for the first file, create an array. index is the column1. value is the whole line. after file1 was completely processed. start working on file2, read column1 from file2 as key, if the key is already in array, means found the same key in file1, then append the value with column2. then print.

i think you may want to figure out the relation between NR and FNR when awk processing two files. try 2 give a shot by google.

116 · March 14, 2011, 1:47pm

or even this way:

awk ' { arr[$1] = arr[$1] $2 } END {for (i in arr) print i arr }' module1 module2

winter9 · March 14, 2011, 2:07pm

The single input file:

>cat module1
200611051053                     95
200523457498                     35
200617890187                     57
200726098123                     66
200645676712                     71
200744556590                     68
 
>cat module2
200645676712                     56
200617890187                     50
200523457498                     29
200726098123                     62
200744556590                     69
200611051053                     90

and expecting output

Reg No                      module1       module2
200611051053                     95        90
200523457498                     35        29
200617890187                     57        50
200726098123                     66        62
200645676712                     71        56
200744556590                     68        69

Sry agian the sample file is a single file but i need to read it as two seperate parts. Then create a table that contain first and second part result together.

My mistaken on the explaination really sry.

vgersh99 · March 14, 2011, 2:10pm

Please post the exact content of a file using code tags to avoid further confusion.

sk1418 · March 14, 2011, 2:24pm

is this what you want?

kent$ cat t1.txt 
>cat module1
200611051053                     95
200523457498                     35
200617890187                     57
200726098123                     66
200645676712                     71
200744556590                     68
 
>cat module2
200645676712                     56
200617890187                     50
200523457498                     29
200726098123                     62
200744556590                     69
200611051053                     90

kent$ awk '$0 !~ "^>.*"{ a[$1]= a[$1]? a[$1]" "$2 : $0} END{for(k in a) print a[k]} ' t1.txt 
 
200645676712                     71 56
200726098123                     66 62
200523457498                     35 29
200611051053                     95 90
200744556590                     68 69
200617890187                     57 50

winter9 · March 14, 2011, 2:33pm

Woot Thanks!