Hello,
I have this shell script that runs awk code by passing in parameters however now it doesn't work anymore with the parameters and I don't know why.
It removes duplicates from an input file based on a part of the last field and a key column. It removes the record with the older datetime in the last column.
In this case the key column of the input file is the first column (so 1).
Input file
1238646010001529 A cmt_det3_937_20101024_065520.txt
1239560010002084 A cmt_det3_937_20101024_065520.txt
1240650010013664 A cmt_det3_937_20101024_065520.txt
1238646010001529 B cmt_det3_937_20101025_065520.txt
1239560010002084 B cmt_det3_937_20101025_065520.txt
1240650010013664 B cmt_det3_937_20101025_065520.txt
The shell script:
#!/usr/bin/sh
pos=$1
infile="$2/$3"
outfile="$4/$5"
if [[ ! -r $infile ]]
then
echo "file is not readable: $infile"
exit 1
fi
# pass the key position using -v
awk -v key_col="$pos"
'{
FS="";
split($NF,a,"_");
site=a[3];
keysite=$(key_col) "_" site;
if (b[keysite]<=a[4]a[5])
{
b[keysite]=a[4]a[5];
c[keysite]=$0;
}
}
END{
for( i in b )
print c;
}' < $infile > $outfile
I run it on the Unix command line like this:
./remove_dups.sh 1 . input.txt . out.txt
I get this error message:
./remove_dups.sh[32]: {^I^J^I^IFS="^_";^J^I^Isplit($NF,a,"_"); ^J
site=a[3];^J keysite=$(key_col) "_" site;^J if (b
[keysite]<=a[4]a[5]) ^J {^J b[keysite]=a[4
]a[5];^J c[keysite]=$0; ^J }^J^I} ^J^J
END{^J for( i in b ) ^J ^I^J^I^Iprint c;^J
^J }: The specified path name is too long.
---------- Post updated at 07:03 PM ---------- Previous update was at 06:39 PM ----------
problem solved
had to remove the space between
< $infile > $outfile
Like this works.
<$infile >$outfile