mutliple files in the same directory

epi8 · May 12, 2008, 2:30pm

I have over 900 files that have the same name except for a unique numeric assignment. For all files I would like to cut the 2nd column and paste all into one new file. All in bash.
sample input format for each file:
1 2 3
1 2 3
1 2 3

sample command for what I want to do:
cut -d' ' -f2 file_in >> file_out

Ultimately I would like process the files through a loop.

Thanks in advance!!!
Edit/Delete Message

jim_mcnamara · May 12, 2008, 2:44pm

With that many files * might overflow the exec limits on command length...

cd /path/to/files
find . -type f  -name 'common_name_fragment*' | \
while read file
do
     cut -d' ' -f2 $file
done > ./tmp
mv ./tmp file_out

era · May 12, 2008, 3:15pm

What's the temporary file for? Couldn't you just as well redirect to file_out directly?

And of course, if your shell can expand all files, the simple thing might work just fine:

cut -d ' ' -f2 * >file_out

If you get "argument list too long", you need a workaround like the find command which jim posted.

epi8 · May 12, 2008, 3:19pm

#!/bin/bash
for file in /home/epi/tmurray/reich_stuff/21
do cut -d' ' -f2 *.21 >> out.txt ; done

The above loop does the extraction but appends the column from each file vertically. How can I get it to append horizontally? e.g:
input:
file 1
1 2
1 2
1 2

file 2
1 2
1 2
1 2

current output
2
2
2
2
2
2

desired output:
2 2
2 2
2 2

thanks.

And Jim --that code you gave me doesn't work. I tweaked it and it still doesn't work. I keep getting errors about synthax

Thanks:)

era · May 12, 2008, 3:43pm

Your "loop" only loops over the directory, once, then ignores the directory when actually doing the cut, but never mind.

cut has a friend paste which places stuff next to each other. For a large number of files, it might not be workable, though; you'd need a temporary file, or a temporary stream, for each file in the set.

Sounds to me like at this point you would be best served by a simple awk or perl script.

I don't see anything wrong with the syntax in jim's command; can you post the actual error message?

era · May 12, 2008, 4:00pm

For a small number of files, this works, at least roughly, but it causes my Bash to dump core with a malloc error when I try it on a large number of files. I'm posting it mainly for its curiosity value.

eval paste "<("cut -d\" \" -f2 `echo *  | sed 's/ /) <(cut -d " " -f2 /g'`")"

This constructs a command line consisting of paste <(cut file1) <(cut file2) <(cut file3) via some rather black magic. The key is really the use of eval and the quoting and backslashes required to pull it off.

era · May 12, 2008, 4:22pm

And here finally is a simple Perl one-liner (or to be really honest, two-liner) which collects an array for each input line number, and at the end prints out each array in line number order.

perl -ane 'push @{$L->[$.]}, $F[1]; close ARGV if eof;
END { shift @{$L}; for $l (@{$L}) { print join (" ", @{$l}), "\n"; } }' *

It's regrettable that you can't do this with just the basic Unix tools.

The close stuff is to reset the line numbering for each file (see perldoc -f eof) and the shift is to get rid of line number zero, which should be empty anyway. $F[1] is the second, space-padded input field, in case you want to change it to something else (Perl arrays are zero-based, so the first field is $F[0]). It's easy enough to make it split on a different separator if you like; see the -a and -F options. This uses references for efficiency (or simply because I'm past bedtime), so it's not particularly elegant or readable.

epi8 · May 12, 2008, 5:07pm

Thanks -this works! how can I modify it so that it prints to file rather than the screen?

thanks again

era · May 13, 2008, 2:04am

You can redirect the output from any command to a file with command > file