Combining certain columns of multiple files into one file

ksennin · September 7, 2017, 8:51am

Hello Unix gurus,

I have a large number of files (say X) each containing two columns of data and the same number of rows.

I would like to combine these files to create a unique merged file containing X columns corresponding to the second column of each file (with a bonus of having the first column of the first file only as a first column of the merged file).

I have only started recently to use bash scripting to try and do this, so I am quite new to this, but I have been trying "paste", or even "pr" as below:

Not the actual code ("..." signify a long list of file names or columns):

pr -m -t -s\  file1 file2 file3 [...] fileX | gawk '{print $1,$2,$4,$6,$8,[...],$X }' > merged.file

I don't really know what I am doing wrong, but when I don't get an error message ("pr: page width too narrow") it is insanely slow, even on a machine with loads of RAM. Would there be a better way to do this basic transformation?

Many thanks for your help and your time!

KS

jgt · September 7, 2017, 10:05am

ksennin:

Hello Unix gurus,

I have a large number of files (say X) each containing two columns of data and the same number of rows.

I would like to combine these files to create a unique merged file containing X columns corresponding to the second column of each file (with a bonus of having the first column of the first file only as a first column of the merged file).

I have only started recently to use bash scripting to try and do this, so I am quite new to this, but I have been trying "paste", or even "pr" as below:
Not the actual code ("..." signify a long list of file names or columns):

pr -m -t -s\  file1 file2 file3 [...] fileX | gawk '{print $1,$2,$4,$6,$8,[...],$X }' > merged.file
I don't really know what I am doing wrong, but when I don't get an error message ("pr: page width too narrow") it is insanely slow, even on a machine with loads of RAM. Would there be a better way to do this basic transformation?

Many thanks for your help and your time!

KS

read the man pages for the cut and paste commands.

---------- Post updated at 10:05 AM ---------- Previous update was at 10:01 AM ----------

use the cut command to retrieve only the second field in all but the first file. then paste all the resultant files to the first.

RudiC · September 7, 2017, 12:56pm

With a recent bash that provides "process substitution", try

join <(join file[12]) <(join file[34])

ksennin · September 7, 2017, 1:11pm

Thank you very much to both for helpful replies