Hi,
Could somebody help me with a code that passes each column through a command or set of commands, one column at a time. LIke this:
cat data:
4 89 87
5 3 89
10 82 4
39 10 39
100 98 9
1 4 3
Code:
awk '{print $1}' data | sort | gstat > group1
awk '{print $2}' data | sort | gstat > group2
awk '{print $3}' data | sort | gstat > group3
Of course, the right code will not have these three different codes, just one code that goes through each column and outputs the result to different files.
Your code is essentially the same as mine. The problem is if I have 100 column, I do not want to write out each column by hand. I am looking for a code that will go through each column one after the other, without me having to do it myself.
Thanks
#!/bin/sh
# Uncomment this to start by deleting tmp files that will be created for further use
# Uncomment it if you plan to use the script more than once in the same dir
# This, because of the "append" redirection in the awk script
#rm "${F:=tmp}"*
# awk script to separate columns into each tmp file
# this returns as well the number of columns for later use -stored in n-
n=$(awk -v f="$F" '
{x=0; while(x++<NF)A[x,FNR]=$x}
END{
for(i=1;i<x;i++){
for(j=1;j<FNR;j++)print A[i,j] >> f i
}
print i-1
}
' data)
# Sorting each tmp into group file
# Using n to count tmp files
while [ "$((i+=1))" -lt "$n" ]; do
sort -n "$F$i" > "group$i"
done
exit 0
# Awk code for transposing and creating group files
awk 'BEGIN{t=1} {
for(i=1;i<=NF;i++) {
a[NR,i] = $i;
}
} NF>nf {
nf = NF;
} END {
for (i=1;i<=nf;i++) {
for (j=1;j<=NR;j++) {
file="group"t;
print a[j,i] > file;
t=(j==NR)?++t:t;
}
}
} ' file
# Sorting the data for each group files
for file in group*
do
sort -n "$file" > tmp; mv tmp "$file"
done
The following is similar to bipinajith's shell script, but doesn't create the temp files, doesn't include empty lines in the data to be sorted if some rows in the data file have more fields than others, feeds the sorted output into your gstat command, and uses 3 digits in the group files instead of a variable number of digits. If you have a 100 columns in your data file, there is also a chance that bipinajith's script will run out of file descriptors on many implementations of awk.
Try:
awk '
{ for(i = 1; i <= NF; i++) a[NR,i] = $i
if(NF > nf) nf = NF
}
END { for(f = 1; f <= nf; f++) {
cmd = sprintf("sort -n | gstat > group%03d", f)
for(i = 1; i <= NR; i++)
if((i,f) in a)
printf("%s\n", a[i,f]) | cmd
close(cmd)
}
}' data
As always, if you're using a Solaris/Sun OS system, use /usr/xpg4/bin/awk or nawk instead of awk .
Note that I've never heard of the gstat command, but as long as it is on your command search path, this should work. (When I tested it, I just used a shell script named gstat that reported that it had been called and listed the contents of the data it found on its standard input.)