Check and compare the 10,000 pnt files contains single record from the /$ROOTDIR/scp/inbox/string1 directory against 39 bad pnt files from the /$ROOTDIR/output/tma/pnt/bad/string1 directory based on the fam_id column value start at position 38 to 47 from the record below. Here is an example of the record from the file in both directories:
PNT0220060503081122003700100000091049000005629001005146417001407712SFirstname Lastname
If fam_id is matched then move current file from the /$ROOTDIR/scp/inbox/string1 directory into the /$ROOTDIR/output/tma/pnt/bad/string1 directory.
If not then continue the normal process
The below code is worked but it took 2 plus hours to complete the comparison process. Please advice if there is a better way to re-write or improve the comparison process to make it run faster and better. Thanks
pntcnt1=`ls -l /$ROOTDIR/scp/inbox/string1 | grep 'PNT.*' | wc -l`
if [[ $pntcnt1 -gt 0 ]] then
for gfile in `ls -1 /$ROOTDIR/scp/inbox/string1/PNT.2*`
do
gline=`sed '1q' $gfile`
x=`echo "$gline" | awk '{ print substr( $0, 38, 9 ) }'`
for bfile in `ls -1 /$ROOTDIR/output/tma/pnt/bad/string1/PNT.2*`
do
bline=`sed '1q' $bfile`
y=`echo "$bline" | awk '{ print substr( $0, 38, 9 ) }'`
if [ "$x" -eq "$y" ]
then
echo "file moved $gfile"
mv -f $gfile /$ROOTDIR/output/tma/pnt/bad/string1
break
fi
done
done
fi
There is room for improvement, but I'm not sure how much improvement it will be. In the end, you need to have a double-loop. There is a possibility for another way, below.
# pntcnt1=`ls -l /$ROOTDIR/scp/inbox/string1 | grep 'PNT.*' | wc -l`
## replaced with:
find /$ROOTDIR/scp/inbox/string1/ -name "*PNT.2*" -print |
# if [[ $pntcnt1 -gt 0 ]] then
## replaced with a while-pipe:
while read gfile
do
# gline=`sed '1q' $gfile` # no longer needed here; awk does it all
x=`awk 'FNR==1 { print substr( $0, 38, 9 ); exit }' $gfile`
# for bfile in `ls -1 /$ROOTDIR/output/tma/pnt/bad/string1/PNT.2*`
find /$ROOTDIR/scp/inbox/string1/ -name "*PNT.2*" -print |
while read bfile
do
# let awk do the string comparison.
if awk -v x="$x" 'FNR==1 { if x == substr( $0, 38, 9 ) exit(0); exit(1); }' $bfile`
then
echo "file moved $gfile"
mv -f $gfile /$ROOTDIR/output/tma/pnt/bad/string1
break
fi
done
done
The other method is memory-intensive: You go through the first directory and build up a tree of filename-string pairs; then you go through the second directory and compare each file's first row to your entries. It can be done in awk, but here's how to do it in perl:
#!/usr/bin/perl -w
$dir1= ; # put the first dir name here
$dir2= ; # put the second dir name here
opendir(D1,$dir1) || die "Cannot open $dir1: $!";
opendir(D2,$dir2) || die "Cannot open $dir2: $!";
# read record snippets from dir1
while ( $file1=readdir(D1) ) {
next unless $file1 =~ /PNT\.2/;
open(FILE,$dir1."/".$file1) || do { warn "Could not open $dir1/$file1, skipping: $!"; next; }
$line=<FILE>;
$X{ substr($line,37,9) } = $file1;
}
close FILE;
# compare to files in dir2
while ( $file2=readdir(D2) ) {
next unless $file2 =~ /PNT\.2/;
open(FILE,$dir2."/".$file2) || do { warn "Could not open $dir2/$file2, skipping: $!"; next; }
$line=<FILE>;
$y=substr($line,37,9);
if (exists $X{ $y }) {
print "mv -f $dir1/$X{$y} $dir2";
delete $X{$y};
}
}
That perl code is untested. It prints out the mv commands, rather than executing them. You can then examine the output is right, and replace the last "print" with "system". Files with spaces and funny characters in them might not work in this case. The substr...37 isn't a mistake. Perl starts counting strings at 0, while awk starts at 1.