Print arguments using array elements not working?

shaner · May 3, 2013, 5:30pm

Hey guys,

I'm new to shell scripting and I'm trying to write a script that takes user input and copies the specified columns from a data file to a new one. In order to account for the possibility of a variable number of columns to copy I wrote a loop that encodes the user's choices in an array and I want to use the array in an awk print line where the array gives the columns to print, but it keeps telling me I have incorrect syntax and it's not working. Here is my user input loop which works fine:

for (( i = 0 ;  i < $num;  i++ ))
do
 read choices
done

where num is already properly defined. Then I use this:

for (( i = 0 ;  i < $num;  i++ ))
do
 awk -v f2=temp1 ' { c = ${choices[$i]}; getline < f2; print $0, c, " "; } ' infilemod >temp2
 mv temp2 temp1
done

If I use something like c=$10 then it prints out the 10th column however many times without any issues, but it doesn't like when I try to loop over the columns of choice using this method. Also just assume that all of my files are properly defined, because like I said it works fine with c=$1 or whatever. Any suggestions?

P.S. there is a fair amount more to this code but everything else is working fine, it's just this issue with defining c as above. Thanks!

Don_Cragun · May 3, 2013, 7:57pm

shaner:

Hey guys,

I'm new to shell scripting and I'm trying to write a script that takes user input and copies the specified columns from a data file to a new one. In order to account for the possibility of a variable number of columns to copy I wrote a loop that encodes the user's choices in an array and I want to use the array in an awk print line where the array gives the columns to print, but it keeps telling me I have incorrect syntax and it's not working. Here is my user input loop which works fine:
for (( i = 0 ;  i < $num;  i++ ))
do
 read choices
done
where num is already properly defined. Then I use this:
for (( i = 0 ;  i < $num;  i++ ))
do
 awk -v f2=temp1 ' { c = ${choices[$i]}; getline < f2; print $0, c, " "; } ' infilemod >temp2
 mv temp2 temp1
done
If I use something like c=$10 then it prints out the 10th column however many times without any issues, but it doesn't like when I try to loop over the columns of choice using this method. Also just assume that all of my files are properly defined, because like I said it works fine with c=$1 or whatever. Any suggestions?

P.S. there is a fair amount more to this code but everything else is working fine, it's just this issue with defining c as above. Thanks!

Variable expansions do not occur inside single quotes. Try:

for (( i = 0 ;  i < $num;  i++ ))
do
 awk -v f2=temp1 -v c="${choices[$i]}" ' { getline < f2; print $0, c, " "; } ' infilemod >temp2
 mv temp2 temp1
done

shaner · May 6, 2013, 10:20am

don cragun:

Variable expansions do not occur inside single quotes. Try:

for (( i = 0 ;  i < $num;  i++ ))
do
 awk -v f2=temp1 -v c="${choices[$i]}" ' { getline < f2; print $0, c, " "; } ' infilemod >temp2
 mv temp2 temp1
done

I tried this and it just prints whatever value c is through the loop, so the columns end up as 1 2 3 etc. So I just need to make it recognize that I wanted the cth column of the data file that I'm working with...

Corona688 · May 6, 2013, 10:51am

You don't need awk's help to do that.

while read COL1 COL2 COL3
do
...
done < inputfile

shaner · May 6, 2013, 10:58am

Could you elaborate on this? As I mentioned I'm new to shell scripting so I'm not exactly sure what this accomplishes. I did something like

while read COL1 COL2 COL3
do
echo "cool"
done < NGC188.master.forBayesian.table.full.UBVRI.SM

and it just prints cool however many times, and I get no user input. Also I need to have a variable number of columns so I'm unsure how to accomplish this.

Don_Cragun · May 6, 2013, 11:22am

shaner:

Could you elaborate on this? As I mentioned I'm new to shell scripting so I'm not exactly sure what this accomplishes. I did something like
while read COL1 COL2 COL3
do
echo "cool"
done < NGC188.master.forBayesian.table.full.UBVRI.SM
and it just prints cool however many times, and I get no user input. Also I need to have a variable number of columns so I'm unsure how to accomplish this.

We obviously don't understand what you're trying to do.

Please give us a real specification that shows us what the elements of choice are set to, what your input file(s) look like, and what the output is that you want to be produced (using code tags for all of these) with an English description of how you expect to convert your input file(s) to your desired output.

Showing us a two small fragments of a shell script that is not working lets us make lots of wild (and obviously incorrect) guesses at what you're trying to do.

shaner · May 6, 2013, 11:29am

Ok my apologies for not being clear enough. I was trying to get right at at but here is the whole code.

# Get input file from user
echo "Provide name of master file for use"
read infile
sed '1,3d;$d' $infile > infilemod

# Get output file name from user
echo "Provide desired output file name"
read outfile

# Get number of columns to be included
echo "Enter how many filters to be used"
read num

# Get which columns to use
echo "Provide the desired columns to use. Press [enter] after each"
for (( i = 0 ;  i < $num;  i++ ))
do
 read choices
done

# Make file
# Print first column
awk ' { print $1, " " } ' infilemod >temp1

# Selected columns
for (( i = 0 ;  i < $num;  i++ ))
do
 awk -v f2=temp1 -v c=${choices[$i]} ' { getline < f2; print $0, c, " "; } ' infilemod >temp2
 mv temp2 temp1
done

# Add header
cat base9header.txt temp1 > tmp; mv tmp $outfile

# Remove extra files
rm infilemod
rm temp1

exit 0

So what it does is gets the input file from the user, gets an output file name, gets the number of columns you want to take from the input, then gets WHICH of those columns to take, then prints them after printing the first column. I'm sure there is a much more streamlined way to do this but so far this is working except for the printing of the choice columns which I can't seem to make work.

The input file is something like 46 columns of data with 3 lines of header that I cut out. I want the code to be able to streamline the choosing of whatever columns I need to use for my work and put those choice columns into a new file with a header that I add from base9header.txt

Thanks for the help!

Corona688 · May 6, 2013, 11:39am

Again you have showed a broken shell script and not the data you want to read. Show that.

shaner · May 6, 2013, 12:01pm

Ok data to be read is as follows:

3701 09 39 51.42 85 22 48.02  15.544
3800 09 35 45.07 85 14 19.24  15.908
3825 09 44 39.06 85 12 04.66  16.113
3829 09 39 52.06 85 11 44.99  16.252
3935 10 16 09.40 85 22 22.69  16.246
3942 10 02 09.58 85 21 58.22  14.464
3977 10 08 41.28 85 19 20.60  16.165
3978 09 54 39.79 85 19 16.53  15.980
3979 10 19 42.93 85 19 12.80  16.170
4006 09 52 01.37 85 18 18.69  14.936
4014 09 57 00.09 85 17 49.63  16.231
4063 10 14 22.03 85 14 37.34  15.716
4065 10 11 55.20 85 14 32.15  15.470
4075 09 52 56.26 85 13 41.39  15.698
4119 10 20 50.27 85 09 55.13  15.772
4163 10 51 58.58 85 26 10.99  15.572
4177 10 53 05.07 85 25 04.96  15.344
4200 10 49 28.47 85 23 02.22  14.927
4228 10 35 44.83 85 20 33.71  13.942
4240 11 00 20.23 85 19 56.52  15.988
4248 10 27 30.88 85 19 14.33  15.848
4250 10 44 38.15 85 19 06.81  15.830
4254 10 37 33.60 85 18 38.93  15.816
4257 10 58 11.40 85 18 35.19  14.914
4267 10 58 58.07 85 18 07.43  15.335
4268 10 52 53.96 85 17 57.84  14.916
4275 10 36 13.78 85 17 37.21  15.784
4280 10 49 24.59 85 17 25.13  14.922
4290 10 31 37.94 85 16 47.25  14.174
4292 10 30 12.11 85 16 25.06  14.368
4294 10 36 23.12 85 16 22.04  12.945
4304 10 27 38.45 85 15 46.72  16.144
4306 10 35 08.85 85 15 39.47  13.347
4317 10 40 42.49 85 14 55.52  15.492
4318 10 59 32.23 85 14 55.08  15.861
4322 10 47 52.54 85 14 44.51  15.998
4328 11 00 27.06 85 14 22.76  15.728
4331 10 59 16.44 85 14 18.55  15.603
4336 10 59 31.97 85 13 50.87  16.098
4341 10 43 18.75 85 13 38.87  15.256
4343 10 58 19.09 85 13 30.98  15.727
4346 10 38 48.96 85 13 25.63  13.621
4364 10 47 29.13 85 12 24.19  15.029
4372 10 32 03.89 85 12 00.92  15.278
4373 10 42 09.04 85 11 46.83  15.508
4375 11 00 43.06 85 11 42.25  15.139
4376 10 46 19.58 85 11 36.34  16.491
4379 10 59 28.37 85 11 29.69  15.888
4380 10 46 13.56 85 11 24.23  16.097
4451 11 34 01.78 85 26 05.47  15.762

I have only included 8 columns of 46 but it's just numerical data organized into columns, so the choices[] can be an array of length 1 to 8 depending on the chosen value of num.

---------- Post updated at 12:01 PM ---------- Previous update was at 11:43 AM ----------

Also you might mention that I don't need to define the array for the column choices since I can just put it in the for loop that writes out the columns and re-define it each time, but I would like to keep it the way it is in case I need to do more things with this script later on that might involve the definition of the column numbers.

Corona688 · May 6, 2013, 12:06pm

If you have BASH:

while read -a array
do
...
done < inputfile

If you don't:

while read -r LINE
do
        set -a array $LINE
done

shaner · May 6, 2013, 12:09pm

You mentioned something like this before but I'm not sure where in my code this belongs. What is the "..." in the while? Sorry for apparently being an idiot.

Corona688 · May 6, 2013, 12:11pm

I mentioned something like it before, but bash has an -a option to read whole arrays instead of a list of variables.

In KSH, you can split one biga rray instead.

Whatever you want.

while read -a array
do
        echo "Column 3 is ${array[2]}"
done < inputfile

shaner · May 6, 2013, 12:16pm

corona688:

I mentioned something like it before, but bash has an -a option to read whole arrays instead of a list of variables.

In KSH, you can split one biga rray instead.

Whatever you want.
while read -a array
do
   echo "Column 3 is ${array[2]}"
done < inputfile

Can you indicate how this solves my problem? Everything seems to be working fine with my script except for the print command that doesn't want to print the columns that I give it...

Corona688 · May 6, 2013, 12:24pm

Hard to say when I can't even tell what you're trying to do... Why are you using getline for one thing? awk reads lines by itself...

You want a complete script? Fine.

printf "Enter a list of columns separated by spaces: "
read COLS

awk -v COLS="$COLS" 'BEGIN { L=split(COLS,C); }

# Only run this for lines 4 and up
NR>3 {
        STR=""
        for(N=1; N<=L; N++) STR=STR " " $(C[N]+0)
        $0=STR
} 1' inputfile

shaner · May 6, 2013, 12:42pm

Like I said I'm new to this and I tried to explain it the best I could. I'll try to give the full run down. I have a 46 column data input file that looks like this:

3701 09 39 51.42 85 22 48.02  15.544   0.661  16.349  0.0368  16.193  0.0122  15.535  0.0117
3800 09 35 45.07 85 14 19.24  15.908   0.728  16.808  0.0320  16.615  0.0124  15.906  0.0117
3825 09 44 39.06 85 12 04.66  16.113   0.734  17.046  0.0319  16.823  0.0123  16.107  0.0117
3829 09 39 52.06 85 11 44.99  16.252   0.787  17.295  0.0334  17.015  0.0125  16.246  0.0117
3935 10 16 09.40 85 22 22.69  16.246   0.744  17.229  0.0314  16.968  0.0120  16.237  0.0117
3942 10 02 09.58 85 21 58.22  14.464   1.043  16.348  0.0306  15.484  0.0123  14.453  0.0116
3977 10 08 41.28 85 19 20.60  16.165   0.733  17.149  0.0358  16.874  0.0121  16.153  0.0117
3978 09 54 39.79 85 19 16.53  15.980   0.804  17.183  0.0307  16.754  0.0121  15.963  0.0117
3979 10 19 42.93 85 19 12.80  16.170   0.828  17.424  0.0360  16.977  0.0104  16.159  0.0113
4006 09 52 01.37 85 18 18.69  14.936   1.012  16.688  0.0335  15.919  0.0122  14.928  0.0117
4014 09 57 00.09 85 17 49.63  16.231   0.730  17.208  0.0361  16.932  0.0122  16.223  0.0117
4063 10 14 22.03 85 14 37.34  15.716   0.707  16.614  0.0263  16.400  0.0103  15.711  0.0112
4065 10 11 55.20 85 14 32.15  15.470   0.679  16.314  0.0262  16.127  0.0104  15.464  0.0113
4075 09 52 56.26 85 13 41.39  15.698   0.660  16.498  0.0256  16.339  0.0120  15.690  0.0115
4119 10 20 50.27 85 09 55.13  15.772   0.713  16.645  0.0285  16.463  0.0119  15.759  0.0117
4163 10 51 58.58 85 26 10.99  15.572   0.672  16.408  0.0262  16.228  0.0129  15.575  0.0125
4177 10 53 05.07 85 25 04.96  15.344   0.670  16.165  0.0253  16.009  0.0098  15.344  0.0096
4200 10 49 28.47 85 23 02.22  14.927   0.763  16.024  0.0102  15.651  0.0071  14.903  0.0084
4228 10 35 44.83 85 20 33.71  13.942   1.047  15.784  0.0217  14.963  0.0091  13.927  0.0110
4240 11 00 20.23 85 19 56.52  15.988   0.698  16.890  0.0299  16.659  0.0086  15.975  0.0092
4248 10 27 30.88 85 19 14.33  15.848   0.695  16.719  0.0287  16.518  0.0104  15.836  0.0113
4250 10 44 38.15 85 19 06.81  15.830   0.676  16.658  0.0276  16.472  0.0086  15.813  0.0092
4254 10 37 33.60 85 18 38.93  15.816   0.684  16.667  0.0275  16.480  0.0103  15.805  0.0113
4257 10 58 11.40 85 18 35.19  14.914   0.713  15.861  0.0159  15.599  0.0079  14.897  0.0089
4267 10 58 58.07 85 18 07.43  15.335   0.693  16.209  0.0242  15.999  0.0085  15.323  0.0090
4268 10 52 53.96 85 17 57.84  14.916   0.742  15.922  0.0215  15.621  0.0085  14.900  0.0090

Obviously I didn't include all 46 columns and 200-some rows but you get the idea. What I want to be able to do is write a script that asks for which columns to take from the input and write to the output. The problem is I could want 4 columns or I could want 10 columns to be written out. I have the code first prompt for the input file name so I don't need to hard-code that, and for the user to give whatever output file name you want.

# Get input file from user
echo "Provide name of master file for use"
read infile
sed '1,3d;$d' $infile > infilemod

# Get output file name from user
echo "Provide desired output file name"
read outfile

Notice that I remove the first 3 lines and the last line from the input file just because those are some header lines that I don't need. Next I have the code ask for how many columns to be copied to the output file, and I store that in the variable num:

# Get number of columns to be included
echo "Enter how many columns to be used"
read num

Now that the script knows how many columns I want to take from the input file, I use a for loop to ask which columns are to be copied:

echo "Provide the desired columns to be included Press [enter] after each"
for (( i = 0 ;  i < $num;  i++ ))
do
 read choices
done

No matter which columns I choose I always want to print the first column of the input file to the first column of the output file so I use this:

awk ' { print $1, " " } ' infilemod >temp1

So at this point I have a file called temp1 that has just the first column printed with a space after. Now I use another loop to try to loop over the choices of columbs to be printed:

for (( i = 0 ;  i < $num;  i++ ))
do
 awk -v f2=temp1 -v c=${choices[$i]} ' { getline < f2; print $0, c, " "; } ' infilemod >temp2
 mv temp2 temp1
done

The way I've written it is just so that awk adds columns to the file temp1 rather than writing over top of them. You can think of it as put in first column of the input file, then put in each column from the input file that you specified. This is where the problem comes in though. It does not print the column c as I have it written. I just get the blank spaces. Very frustrating!

Finally I have the code put in a header line to the output file that I have stored in another file called header.txt and remove the temp files that I used.

# Add header
cat header.txt temp1 > tmp; mv tmp $outfile

# Remove extra files
rm infilemod
rm temp1
exit 0

Hopefully this is clearer now. Sorry about all the confusion. I really appreciate the help!

Don_Cragun · May 7, 2013, 12:40am

Hi shaner,
You still have not provided any sample output, but I think the following bash/awk script will do what you want. Note that this shell script gets its parameters from the command line instead of carrying on an interactive dialog with the user. This script does a lot more user input validation than the snippets of your code did.

I normally use ksh rather than bash, but there is a bug in the way ksh version sh (AT&T Research) 1993-12-28 s+ handles ${var#*[!0-9]} if the expansion of $var contains an exclamation point character ( ! ) at least on OS X. If your users will never type in an exclamation point when entering a field number to be copied from the input file to the output file, both a current bash and a ksh93 or later version of the Korn shell will produce the same results with this script. (As always, if you want to run this on a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk rather than awk .)

This is a big script for this forum, but the vast majority of the code is comments. The comments explain the assumptions I made (many of which were not mentioned in your requirements).

#!/bin/bash
# USAGE:
#       maybe input_file output_file new_headers_file field_number...
#
# DESCRIPTON:
#       The maybe utility copies fields specified by the "field_number"
#       operands from the input file specified by the "input_file" operand to
#       the ouput file named by the "output_file" operand.  The first three
#       lines and the last line of the input file shall be skipped.  The
#       contents of the file named by the "new_headers_file" operand shall be
#       copied to the output file before any other data is copied from the
#       input file to the output file.  Fields in the input file are assumed to
#       be separated by one or more spaces and tabs.  Leading and trailing
#       spaces and tabs in the input file shall be ignored.  Fields written to
#       the output file shall be separated by a single tab character.  The
#       output fields shall be written to the output file in the order
#       specified by the "field_number" operands in order from beginning to end.
#
#       The results are unspecified if the input file is not a text file.
#
#       If a "field_number" operand specifies a field that does not exist in
#       the input file, an empty field will be written to the output file in
#       the corresponding output field.  Input fields are numbered starting
#       with 1.  If a "field_number" operand is 0, the entire input line will
#       be inserted at that point in the output.
#
# RATIONALE:
#       The name of this utility is chosen because this script may be doing
#       what the original poster in this thread wants.  The description of what
#       is supposed to happen has only been described by brief text and samples
#       fo code that doesn't work.  No sample output has been provided.  No
#       input with headers has been provided.

# Initialize global variables:
IAm=${0##*/}    # Final component of pathname used to invoke this script.
Usage="Usage: %s input_file output_file new_headers_file field_number..."

# Process command line arguments:
if [ $# -lt 4 ]
then    printf "%s: Not enough operands.\n$Usage\n" "$IAm" "$IAm" >&2
        exit 1
fi
if [ ! -r $1 ] || [ ! -r $3 ]
then    printf "%s: input_file or new_headers_file not readable.\n" "$IAm" >&2
        printf "$Usage\n" "$IAm" >&2
        exit 2
fi
inf="$1"
outf="$2"
nhf="$3"
shift 3

# Capture the remaining operands as an array of field_numbers
fn=( "$@" )

# Verify that each field number is non-null and only contain digits.
i=0
while [ $i -lt ${#fn[@]} ]
do      if [ ${#fn[$i]} -lt 1 ] || [[ "${fn[$i]}" != "${fn[$i]#*[!0-9]}" ]]
        then    printf "%s: field_number is not numeric.\n$Usage\n" "$IAm" \
                        "$IAm" >&2
                exit 3
        else    i=$((i + 1))
        fi
done

# Copy desired headers to output file.
if ! cat "$nhf" > "$outf"
then    printf "$Usage\n" "$IAm" "$IAm" >&2
        exit 4
fi

# Invoke awk to gather and print the requested fields after throwing away the
# 1st three header lines and the last trailer line.
printf "%d\n" ${fn[@]} | awk '
FNR == NR {
        # Read list of fields to be printed from standard input...
        fn[++nf] = $1
        next
}
FNR <= 3 {
        # Skip 3 header lines from 2nd input file.
        next
}
out != "" {
        # Print the results from the previous input line.  (Note that we also
        # have to skip the last line read for some unspecified reason.  The
        # delayed print allows us to do that since we do not print the last
        # line when we hit EOF.)
        print out
}
{       # Save requested fields (to be printed when we read the next line).
        for(i = 1; i <= nf; i++)
                i == 1 ? out = $fn[1] : out = out "\t" $fn
}' - "$inf" >> "$outf"

shaner · May 7, 2013, 12:59pm

Dear Don Cragun,

Thanks so much! This works great and is exactly what I was trying to do. Sorry again for all the miscommunication on my part, I really appreciate the help!