Applying the same awk over a directory of files with individual file output

owwow14 · November 9, 2015, 9:39am

I am trying to apply an

awk

action over multiple files in a directory. It is a simple action, I want to print out the 1st 2 columns (i.e. $1 and $2) in each tab-separated document and output the result in a new file

*.pp

This is the

awk

that I have come up with so far, which is not giving me a result. Can someone help me identify the error?

awk FNR == 1 {if (o)close(o) o = FILENAME sub(/\*/, ".pp", o)} NR % $2,$1 {print > }

RavinderSingh13 · November 9, 2015, 9:50am

Hello owwow14,

Could you please try following and let me know if this helps.

for i in *.pp
do
   awk '{print $1 OFS $2 >> new_input_file}' OFS="\t" $i
done
  
OR
 
awk '{print $1 OFS $2 >> "new_output_file.txt";close(FILENAME)}' OFS="\t" *.pp

I haven't tested though, let me know if you have any queries on same.

Thanks,
R. Singh

owwow14 · November 9, 2015, 10:10am

Dear R. Singh,

I have tried and unfortunately it does not help - it seems to outut all of the files into one files called *.pp, rather than individual files with the suffix .pp
Let me preface a bit more the data. The file is tab-separated but there are lines of content in each column.
For instance, the information in the files would look something like this:

File1:

I love you man    THIS IS GREAT NEWS    5    www.url.com

File2:

I love you girl    THIS IS AWESOME NEWS    6    www.url.org

File3:

I love you son    THIS IS BAD NEWS    7    www.url.co.uk

I need to print out in individual output files just the first two columns, so the output would be.

File1.pp

I love you man    THIS IS GREAT NEWS

File2.pp

I love you girl    THIS IS AWESOME NEWS

File3.pp

I love you son    THIS IS BAD NEWS

When I need to extract quickly information from a column, I usually query the document also by defining the separators:

awk -F'\t' '{print $1,$2}' input > output

.

RavinderSingh13 · November 9, 2015, 10:25am

Hello owwow14,

You could try these following ones but I haven't tested these too. Let me know if you have any queries.

for i in file*
do
   awk '{print $1 OFS $2 >> "file"++i".pp"}' FS="\t" OFS="\t" $i
done
OR 
awk '{print $1 OFS $2 >> "file"++i".pp";close(FILENAME)}' FS="\t" OFS="\t" file*

Thanks,
R. Singh

owwow14 · November 9, 2015, 10:32am

Hi again, R. Singh,

It seems to be working better, except I keep getting the error

"awk: cannot open "file1021.pp" for output (Too many open files)
"

I tried to modify the one-liner as follows to make sure that the files were closed:
See here:

awk '{print $1 OFS $2 >> "file"++i".pp";close("file"++i".pp")}' FS="\t" OFS="\t" *

However, I am still getting the same error, just with a file with a higher N, i.e.

awk: cannot open "file2041.pp" for output (Too many open files)

Any ideas where the leak is coming from?

RavinderSingh13 · November 9, 2015, 10:44am

Hello owwow14,

Could you please give it a try, haven't tested this though too.

awk '{i="file"++i".pp";print $1 OFS $2 >> i;close(i)}' FS="\t" OFS="\t" file*

Also how about for loop solution in my previous post, that would have worked I think.

Thanks,
R. Singh

Scrutinizer · November 9, 2015, 2:40pm

Try:

awk 'FNR==1{close(f); f=FILENAME ".pp"} {print $1,$2>f}' FS='\t' OFS='\t' File*

or

for f in File*
do
  cut -f1,2 "$f" > "$f.pp"
done