Merge columns from multiple files

ali.seifaddini · May 26, 2014, 4:27am

Hello and Good day

I have a lot of files with same number of rows and columns.$2 and $3 are the same in all files .
I need to merge $2,$3,$6 from first file and $6 from another files.

File1:

$1      $2            $3           $4            $5         $6
75	3.66192	101.54361	2.000		5.5	0.0804649
79	3.63195	101.54361	2.000		7.5	0.107302
79	3.63195	101.55112	2.000		7.5	0.107302
78	3.62445	101.54361	2.000		7	0.0998519
78	3.62445	101.55112	2.000		7	0.0998519
0	3.61696	101.54361	2.000		-32	0.000364633
0	3.61696	101.55112	2.000		-32	0.000364633

File2:

$1      $2            $3           $4            $5         $6
0	3.66192	101.54361	2.000		-32	0.000364633
0	3.63195	101.54361	2.000		-32	0.000364633
0	3.63195	101.55112	2.000		-32	0.000364633
0	3.62445	101.54361	2.000		-32	0.000364633
0	3.62445	101.55112	2.000		-32	0.000364633
0	3.61696	101.54361	2.000		-32	0.000364633
0	3.61696	101.55112	2.000		-32	0.000364633

File3:

$1      $2            $3           $4            $5         $6
80	3.66192	101.54361	2.000		8	0.115307
73	3.63195	101.54361	2.000		4.5	0.0696797
73	3.63195	101.55112	2.000		4.5	0.0696797
0	3.62445	101.54361	2.000		-32	0.000364633
0	3.62445	101.55112	2.000		-32	0.000364633
0	3.61696	101.54361	2.000		-32	0.000364633
0	3.61696	101.55112	2.000		-32	0.000364633

and so on....

Expected Result:

$2         $3           $6 File1       $6 File2     $6File3 .....
3.66192	101.54361   0.0804649       0.000364633   0.115307
3.63195	101.54361   0.107302	    0.000364633   0.0696797
3.63195	101.55112   0.107302	    0.000364633   0.000364633
3.62445	101.54361   0.0998519       0.000364633   0.000364633
3.62445	101.55112   0.0998519       0.000364633   0.000364633
3.61696	101.54361   0.000364633    0.000364633   0.000364633
3.61696	101.55112   0.000364633    0.000364633   0.000364633

Thanks in advance

SriniShoo · May 26, 2014, 4:51am

awk 'BEGIN{OFS = "\t"}
  NR == FNR {a[$2 " " $3] = $6; next}
  {a[$2 " " $3] = (a[$2 " " $3] OFS $6)}
  END {for(x in a) print (x OFS a[x])}' file1 file2 file3...

ali.seifaddini · May 26, 2014, 5:18am

Hi
Thanks for your response.
I have 144 files. Is it needed to put all files one by one in the script? (File1 file2...). How can be used loop for this script?
And also it is important assign name of each file to related column. Thanks

---------- Post updated at 12:48 PM ---------- Previous update was at 12:46 PM ----------

Hi
Thanks for your response.
I have 144 files. Is it needed to put all files one by one in the script? (File1 file2...). How can be used loop for this script?
And also it is important assign name of each file to related column. Thanks

RavinderSingh13 · May 26, 2014, 5:41am

Hello,

you can use following, just adding a * to SriniShoo's code.

 awk 'NR==FNR{a[$2 " " $3]=$6;next} {a[$2 " " $3]=a[$2 " "  $3] OFS $6} END{for(i in a){print i OFS a}}' File*

Output will be as follows.

3.61696 101.55112 0.000364633 0.000364633 0.000364633
3.62445 101.55112 0.0998519 0.000364633 0.000364633
3.63195 101.54361 0.107302 0.000364633 0.0696797
3.61696 101.54361 0.000364633 0.000364633 0.000364633
3.66192 101.54361 0.0804649 0.000364633 0.115307
3.62445 101.54361 0.0998519 0.000364633 0.000364633
3.63195 101.55112 0.107302 0.000364633 0.0696797

Thanks,
R. Singh

ali.seifaddini · May 26, 2014, 11:33am

ravindersingh13:

Hello,

you can use following, just adding a * to SriniShoo's code.

 awk 'NR==FNR{a[$2 " " $3]=$6;next} {a[$2 " " $3]=a[$2 " "  $3] OFS $6} END{for(i in a){print i OFS a}}' File*

Output will be as follows.

3.61696 101.55112 0.000364633 0.000364633 0.000364633
3.62445 101.55112 0.0998519 0.000364633 0.000364633
3.63195 101.54361 0.107302 0.000364633 0.0696797
3.61696 101.54361 0.000364633 0.000364633 0.000364633
3.66192 101.54361 0.0804649 0.000364633 0.115307
3.62445 101.54361 0.0998519 0.000364633 0.000364633
3.63195 101.55112 0.107302 0.000364633 0.0696797

Thanks,
R. Singh

Hello sir

Something is wrong with the script because there are some extra columns and also rows are miss ordered.
I attached sample data for your try.

Thank you so much.

ali.seifaddini · May 26, 2014, 11:52am

Hello sir
Is it possible to run the script with sample files which I attached on the above message.
There are some extra columns and the rows disarranged.
Regards.

alister · May 26, 2014, 12:34pm

The awk for-in loop has no defined ordering.

Regards,
Alister

ali.seifaddini · May 26, 2014, 12:41pm

What about extra columns?

Don_Cragun · May 26, 2014, 4:23pm

What operating system are you using?

Do all of your input files have all of the same values in columns 2 and 3 and do they all have those values in the same order? (If not, how should the output denote no data for a given $2,$3 pair for a given input file?)

Do all 144 files have a common naming scheme? (Are the names really File1 through File144? If so, is it important for the output columns to have the files listed in numeric sort order rather than alphanumeric sort order?)

Your input files seem to have tab separated fields (except the header line), should the output be tab separated or use sequences of spaces for separators? Is it important for the output columns to be aligned and adjust to the maximum input field width for each column (including filenames in the headers)? Why do some of the header fields have $6 filename (with a space) and some have $6filename (without a space)? How is your script supposed to know which output field headers are supposed to have a space?

Are there other files in the directory where you are going to run this script other than the 144 input files you're trying merge? (I assume there are and that is why you're seeing extra columns.) Show us the output you're getting from SriniShoo's script and RavinderSingh13's script for your three sample files. What "extra columns" are you getting?