Script to place selected columns from a group of files side by side in a new file

ks_reddy · February 11, 2009, 9:01am

Hi Everyone,

I need a shell/perl script to bring selected columns from all the files located in a directory and place them in a new file side by side.

File1:
a b c d
2 3 4 5
f g h i
..........
File2:

I II III IV
w x y z
..............
and so on many files are there...

My Output for selected columns(example 1 and 2) from source files should be:

a b I II..........................so on(here last file contents)
2 3 W X..........................so on(here last file contents)
f g
............ So on

Thanks in advance ........

quirkasaurus · February 11, 2009, 9:29am

you can grab the columns using:

awk '{ print $1, $2 }' file_nm > file_nm_out

and combine ( up to 12 -- but some characters might get swallowed up )
the 2-column-output-files with paste:

paste file1 file2 > file_comb.1

However, with lots of files... this will become a little bit of a challenge.

I'm willing to bet there's an easier approach to your actual problem.

meaning -- why do the first 2 columns need to appear in a new file?
and how many files are we actually talking about?
what is the final destination for this data file we're building?
how were the initial files created in the first place?

it seems if we redesign any one of these steps, we might be able to design a better end-to-end process.

Perhaps you could restate the actual issue?

danmero · February 11, 2009, 9:48am

awk 'NR==FNR{_[NR]=$0;next}{print $1,$2,_[FNR]}' file2 file1

ks_reddy · February 12, 2009, 1:53am

I want to place the selected column(s) from all the files (e.g. col 2 from all files together) from a directory, not just two files....

I tried this command: paste | awk '{print $2}' *
But I got the output one after the other in a new file.

$2 from 1st file
$2 from 2nd file
.............. so on..

But what I need is $2 from 1st file <tab> $2 from 2nd file<tab> ..............so on....

I have thousands of files with me All are similar files. They contains 34 columns and 1000 rows. Please help me...

Finally I need to plot one column from one file against the other columns from other files depending on the column header.

ce9888 · February 12, 2009, 2:01am

awk '{printf "\t",$2}' *

ks_reddy · February 12, 2009, 3:00am

This command awk '{printf "\t", $2 }' did not work for me. I tried already. Can anybody explain the reason??

rakeshou · February 12, 2009, 3:13am

have you tried paste command?

ce9888 · February 12, 2009, 10:59am

Try this one :

awk '{printf "\t",$2}' < `find . -type file`

quirkasaurus · February 12, 2009, 11:09am

he wants the 2nd of a 1000 files printed side by side.

i maintain this can't be done without a heroic bunch of scripts
and quite possibly a custom version of 'paste' written in either
C or perl.

I built something to do this, but paste complains of "line too long"
after some threshold.

You will also run into the "too many files open" error if you try
to do them all at once.

this problem is the proverbial "bigger than a bread box" one.
that is --- sure i could solve it - - - but i'd expect renumeration.