grep or awk?

system · March 2, 2001, 3:42pm

Here's my problem:

I have a file that contains say for this example, three records, each twenty bytes long:
CustNum CustName
1111111111abcdefghij
2222222222abcdefghij
3333333333abcdefghij
I have two other very large files (over 500,000 records) one is 500 bytes, the other is 200 bytes long. These two files contain the CustNum from the first file as well as MANY more that I don't want.
I want to extract out the CustNum from the first file, then do a loop thru the other two files, matching the CustNum and only writting out those matching records all 500 or 200 bytes.
Essentially, I want to reduce the 100,000 record file to a managable amount.

I have tried cut to extract the records out to a variable, then loop with grep, but the results produce a file of one continuous record. No newline? Should I use awk instead?

Any help would be appreciated......
Here is my code:

#! /bin/ksh
# get_bids_autorenew.sh

#== Local Variables ==#
datadir="/ias/users/app4dxh/data"
driver="$datadir/driver.dat"
file_1="tcpcsm2.data"
file_2="tcpvrm2.data"
num=0
stat=0

Get_Bids()
{
cd $biddir

if [ -s $driver ]
then
bids=`cut -c1-10 $driver`
stat=$?
else
stat=$?
echo "Function: $0 - No data found in $driver or file does not exist."
echo "Aborting script with a status of $stat"
exit 1
}

Match_Files()
{
#== For each bid picked up, check each of the CP and ==#
#== SM files for a match and just write those records. ==#

for i in $bids
do
match=`grep -s $i $file`
stat=$?
while :
do
case "$stat" in
0) echo $match >> file_$num.new
echo Status is $stat
break;;
1) break;;
2) echo "Function: $0 - The file $file is not accessible - grep status is $stat"
break;;
esac
done
done

}

#==Main==#
Get_Bids

#== For each File, execute the Match_Files function ==#
cd $datadir
for data in "$file_1" "$file_2"
do
set $data
file=$1
let num="$num + 1"
Match_Files
done
exit 0

lsquillacioti · March 2, 2001, 5:03pm

Hi, did yo use the comm command?
Try this with the sort command first to order the files.
I hope to help you.
See you!!

mib · March 3, 2001, 8:28am

Here is one Perl Script that could solve Your problem (if i am not mistaken your problem). use it as "scriptname Custnofile serachinfile outputfile".

#!/bin/perl

unless ($ARGV[2]) { print "Usage scriptname: <inputfile1-CustNo> <inputfile2:Search in> <outputfile> \n"; exit;}

$CustNoFile=$ARGV[0];
$SearchIn=$ARGV[1];
$NewFile=$ARGV[2];

$DataDir="/ias/users/app4dxh/data";

open (DATA, "$DataDir/$CustNoFile") || die ("Can't open $CustNoFile Reson $!");
@DataArray = <DATA>;
close (DATA);

foreach $Record(@DataArray){
$L= substr($Record, 0, 10);
$Key .= "$L\n";
}
@Keys = split(/\n/, $Key);

open (FILE, "$DataDir/$SearchIn") || die ("Can't open $SearchIn Reason $!");
@Check = <FILE>;
close (FILE);

open (NEW, ">>$DataDir/$NewFile") || die ("can't open $NewFile Reson $!");

foreach $Line(@Check){
chomp ($Line);
$count=0;
foreach $key(@Keys) {
chomp ($key);
if ($Line =~ $key ) { print NEW "$Line\n"; $Found++; next;}
}
}

close (NEW);
print "$Found Keys Found and copied to $NewFile\n";

[Edited by mib on 03-03-2001 at 08:34 AM]