need to pass parameters to working and tested awk script

script_op2a · November 2, 2010, 8:36pm

I have a working and tested AWK script that removes duplicates from an input file and generates an output file without the duplicates.

I had help from my other post to develop it:

I want to now make this working AWK script more dynamic and execute with parameters from a shell script (a .sh file).

My UNIX Directory

/home/usr/script

This directory contains:

rem_dups.awk (this is the functional awk script)
dups_file.txt (this file contains duplicate records)
rem_dups.sh (the shell script file that I want to use to execute the awk script.)

CONTENTS of rem_dups.awk (this is the functional AWK script)

#!/bin/sh

awk '{split($NF,a,"_"); key=$1;site=a[3];keysite=key "_" site;
if (b[keysite]<=a[4]a[5]) {b[keysite]=a[4]a[5];c[keysite]=$0;}}
END{for (i in b) print c}' dups_file.txt

CONTENTS of dups_file.txt (this is the file to be input into rem_dups.awk, it contains duplicates)

1238646 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101024_065520.txt
1239560 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101024_065520.txt
1240650 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101024_065520.txt

1238646 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101025_054320.txt
1239560 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101025_054320.txt
1240650 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101025_054320.txt

CONTENTS of rem_dups.sh (this is the file that I want to use to pass parameters to the AWK script)(this file DOES NOT WORK)

#!/usr/bin/sh
#--------------------------------------------------------------------- 
# Program ....... rem_dups.sh
# Function ...... removes duplicates from EDI files 
# Developer ..... script_op2a 
# Date .......... November 2 2010 
# Parameters .... $1 = Position of key column in input file (Required)
#                        $2 = Unix Script directory (Required)
#                        $3 = Input file name of file to remove duplicates from (Required)
#        
# Dependencies .. None
# 
# Notes: This program was built on awk code.
#
# 1) The position of the key column valid values are $1 and greater (the $1 in this case would be the AWK $1 or $2 ect..)
# 2) The name of the UNIX directory where the input file is located
# 3) Input file name must be the name of the file containing duplicate records

pos=$1
filedir=$2
filename=$3

awk '{FS="";split($NF,a,"_"); key=pos;site=a[3];keysite=key "_" site;
if (b[keysite]<=a[4]a[5]) {b[keysite]=a[4]a[5];c[keysite]=$0;}}
END{for (i in b) print c}' filedir "/" filename

UNIX COMMAND LINE

From the UNIX command line I want to execute the shell like this

./rem_dups.sh $2 /home/usr/script/ dups_file.txt > no_dups.txt

This should remove the duplicates in the file based on the AWK $2 second field and the $NF (last field as shown in the AWK code) however
should the file change I would be able to specify AWK $1 or or AWK $3 as the key column for the AWK script in the shell script.

It's confusing because the shell accepts parameters like $1 $2 ect, and AWK uses $1 $2 etc for the fields of the input file.

I want to say with the shell parameters, that OK, now I choose AWK field $1 or no, now I choose AWK field $2 to insert into the value for the variable "key" in the awk script.

I also want to specify the directory and filename of the input file that contains the duplicates to be removed.

agama · November 2, 2010, 9:18pm

Assuming you have a modern awk (nawk on Sun) this should work. I've added some 'fluff' to illustrate how to capture the command line parameters and use them in your script (don't mean to imply that you don't know how, but thought it made for a more complete example).

#!/usr/bin/env ksh

pos=$1
infile="$2/$3"
outfile=$infile.new

if [[ ! -r $infile ]]
then
        echo "file is not readable: $infile"
        exit 1
fi

# pass the key position using -v
awk -v key_col="$pos" '
        {
                split($NF,a,"_"); 
                key=$(key_col);   # this should be the only internal change (not needed -- see 2nd example)
                site=a[3];
                keysite=key "_" site;
                if (b[keysite]<=a[4]a[5]) 
                {
                        b[keysite]=a[4]a[5];
                        c[keysite]=$0;
                }
        }
        END{
                for( i in b ) 
                        print c;
        }' <$infile >$outfile

CAUTION: I've not tested this.

Since you only use the variable key to construct keysite, you could eliminate the first assignment and just code this:

keysite=$(key_col) "_" site;

Hope this helps get you started.

script_op2a · November 3, 2010, 12:23pm

Hello,

Thank you, I am now testing your file.

I'm trying to run the it from the UNIX Command Line using both the following ways:

./rem_dups.sh "$1" /home/usr/script dups_file.txt

./rem_dups.sh $1 /home/usr/script dups_file.txt

Command Line Error

ksh: 1: parameter not set

----------------------------------------------------------------------------------------------------------------------------------------------------------------
Is it the way I am passing the parameters?

I tried changing the 1st line in the .sh file to both:

#!/usr/bin/sh

#!/usr/bin/ksh

with the same result,

ksh: 1: parameter not set

---------- Post updated at 11:41 AM ---------- Previous update was at 11:34 AM ----------

I need to use the sh environment.

So can we continue to make it work using

#!/usr/bin/sh

as the first line?

---------- Post updated at 12:23 PM ---------- Previous update was at 11:41 AM ----------

The most progress I have made so far is changing what I enter on the UNIX Command Line to:

./rem_dups.sh '$1' /home/usr/script dups_file.txt

# - I added single quotes because I 
# want the AWK script to literally receive the value $1 (that is - the $ sign and the number 1) so that it looks at the 1st field of the infile.

I get this error on the command line:

awk: Field $$1) is not correct
 The input line number is 1.
 The source line number is 4.

vgersh99 · November 3, 2010, 12:28pm

./rem_dups.sh 1 /home/usr/script dups_file.txt