I have a working and tested AWK script that removes duplicates from an input file and generates an output file without the duplicates.
I had help from my other post to develop it:
I want to now make this working AWK script more dynamic and execute with parameters from a shell script (a .sh file).
My UNIX Directory
/home/usr/script
This directory contains:
- rem_dups.awk (this is the functional awk script)
- dups_file.txt (this file contains duplicate records)
- rem_dups.sh (the shell script file that I want to use to execute the awk script.)
CONTENTS of rem_dups.awk (this is the functional AWK script)
#!/bin/sh
awk '{split($NF,a,"_"); key=$1;site=a[3];keysite=key "_" site;
if (b[keysite]<=a[4]a[5]) {b[keysite]=a[4]a[5];c[keysite]=$0;}}
END{for (i in b) print c}' dups_file.txt
CONTENTS of dups_file.txt (this is the file to be input into rem_dups.awk, it contains duplicates)
1238646 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101024_065520.txt
1239560 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101024_065520.txt
1240650 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101024_065520.txt
1238646 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101025_054320.txt
1239560 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101025_054320.txt
1240650 ,QO,SO,IN,PA,PO,SH,BL,DO,IS file_937_20101025_054320.txt
CONTENTS of rem_dups.sh (this is the file that I want to use to pass parameters to the AWK script)(this file DOES NOT WORK)
#!/usr/bin/sh
#---------------------------------------------------------------------
# Program ....... rem_dups.sh
# Function ...... removes duplicates from EDI files
# Developer ..... script_op2a
# Date .......... November 2 2010
# Parameters .... $1 = Position of key column in input file (Required)
# $2 = Unix Script directory (Required)
# $3 = Input file name of file to remove duplicates from (Required)
#
# Dependencies .. None
#
# Notes: This program was built on awk code.
#
# 1) The position of the key column valid values are $1 and greater (the $1 in this case would be the AWK $1 or $2 ect..)
# 2) The name of the UNIX directory where the input file is located
# 3) Input file name must be the name of the file containing duplicate records
pos=$1
filedir=$2
filename=$3
awk '{FS="";split($NF,a,"_"); key=pos;site=a[3];keysite=key "_" site;
if (b[keysite]<=a[4]a[5]) {b[keysite]=a[4]a[5];c[keysite]=$0;}}
END{for (i in b) print c}' filedir "/" filename
UNIX COMMAND LINE
From the UNIX command line I want to execute the shell like this
./rem_dups.sh $2 /home/usr/script/ dups_file.txt > no_dups.txt
This should remove the duplicates in the file based on the AWK $2 second field and the $NF (last field as shown in the AWK code) however
should the file change I would be able to specify AWK $1 or or AWK $3 as the key column for the AWK script in the shell script.
It's confusing because the shell accepts parameters like $1 $2 ect, and AWK uses $1 $2 etc for the fields of the input file.
I want to say with the shell parameters, that OK, now I choose AWK field $1 or no, now I choose AWK field $2 to insert into the value for the variable "key" in the awk script.
I also want to specify the directory and filename of the input file that contains the duplicates to be removed.