[Solved] awk manipulation of sequentially named files

jaldo0805 · June 4, 2013, 4:24pm

Hello, I am a very novice user of awk, I have a set of files named file001, file002, file003, file004, etc., each contains four fields (columns of data) separated each by a uneven number of spaces. I want to substitute those spaces by a TAB, so I am using this line of awk script:

awk -v OFS="\t" '$1=$1' file00X > filex00X

I've tried several ways of looping but all of them were unsuccessful, and although this is still way faster than introducing the TAB's by hand it still takes a lot of time to go through all of them.
Thanks,

Don_Cragun · June 4, 2013, 4:37pm

jaldo0805:

Hello, I am a very novice user of awk, I have a set of files named file001, file002, file003, file004, etc., each contains four fields (columns of data) separated each by a uneven number of spaces. I want to substitute those spaces by a TAB, so I am using this line of awk script:
awk -v OFS="\t" '$1=$1' file00X > filex00X
I've tried several ways of looping but all of them were unsuccessful, and although this is still way faster than introducing the TAB's by hand it still takes a lot of time to go through all of them.
Thanks,

Close. Try:

for i in file???
do      awk -v OFS="\t" '$1=$1' "$i" > tmp$$
        cp tmp$$ "$i"
done
rm -f tmp$$

Almost any command that reads and writes a file using the command line syntax:

command option file > file
    or
command option < file > file

will have file truncated to size 0 by the shell before command starts running.

The cp in the script could be changed to mv (which would be more efficient) iff all of the files you want to update only have one link. (If you know that your input files all only have one link and you change the cp to mv, you can also get rid of the rm at the end of the script.)

jaldo0805 · June 4, 2013, 4:53pm

Thanks a lot, problem solved

Jotne · June 4, 2013, 4:56pm

Another solution

awk '{$1=$1;print > FILENAME"_2"}' OFS="\t" file*

This creates new file like file001_2 file002_2 etc.

Don_Cragun · June 4, 2013, 5:19pm

Note that this form will use up a file descriptor for every input file processed. Many awk implementations will run out of file descriptors if this script is given more than about 18 files.

If you're going to use this form (to keep your original files intact) when there is a large number of input files, you'll probably need to close the output files when you're done with them using something like:

awk '
FNR == 1 {
        if(of) close(of)
        of=FILENAME"_2"
}
{       $1=$1
        print > of
}' OFS="\t" file*