improving my script

bcheaib · July 14, 2004, 6:19am

Hi;

I want to access our customer database to retreive all clients that have as language index 2 or 3 and take their client number.
My input is a file containing all client numbers.
i access the data base using a function call "scpshow". The total number of clients i want to scan is 400 000 clients.

I tried this script using 100 subscriber it took 40 sec, and around 7% cpu time.

how can i improve my script to make it faster (IF it can ever be done).

#! /usr/bin
date
for sub in `cat /tmp/sublist `
do
lang=0
lang=`/IN/scp/test/scpshow/scpshow ul50 $sub | grep 'usr_int\[23\]' | awk '{print $2}'`

if [ $lang -eq 2 ] || [ $lang -eq 3 ];
then echo $sub >> /tmp/target
fi
done
date
cut -c 16-26 /tmp/target >> /tmp/final

thanks for your fast reply.

zazzybob · July 14, 2004, 9:36am

Replacing

for sub in `cat /tmp/sublist `
do
  ...
done

with

while read sub
do
  ...
done < /tmp/sublist

may well speed up the processing of the loop and avoid a useless use of cat....

If you're processing 400,000 records it's going to be slow whatever you do - you'd be better off writing it in C if you'll be running it often.

Cheers
ZB

Perderabo · July 14, 2004, 9:42am

What does the output from scpshow look like? And what shell are you really using? #!/usr/bin won't work.

bcheaib · July 14, 2004, 9:59am

the output is C300090B901900096393111222

i am calling the script #> ksh scritpname and it works...

Originally posted by zazzybob
Replacing
for sub in `cat /tmp/sublist `
do
  ...
done
with
while read sub
do
  ...
done < /tmp/sublist
may well speed up the processing of the loop and avoid a useless use of cat....

If you're processing 400,000 records it's going to be slow whatever you do - you'd be better off writing it in C if you'll be running it often.

Cheers
ZB

Perderabo · July 14, 2004, 12:03pm

You need to make that first line:
#! /usr/bin/ksh

If the output of "scpshow ul50 $sub" is:
C300090B901900096393111222

then I'm confused how the pipeline works. That grep is not going to match anything. And there is no 2nd field for awk to print.

bcheaib · July 15, 2004, 2:10am

The output of scpshow is
.
.
.
usr_int[0] 0
usr_int[1] 9
.
.
usr_int[23] 1
usr_int[24] 3
.
.
.
i need to check the value of usr_int[23] if it is 2 or 3 i have to take stor the Sub. number.

I call scpshow in following way
>scpshow C300090B901900096393123345 <--"sub. numer"
and the output is the full profile of this specific sub (shown above) and i need to get all subs whose usr_int[23] =2 or 3.
the list of all sub. numbers is stored in the file "sublist". around 400,000 sub.

Perderabo · July 15, 2004, 10:59am

The fastest script would be:

#! /usr/bin/ksh
exec < sublist
while read sub ; do
        /path/to/scpshow  ul50 $sub | while read string lang ; do
                  [[ $string = 'usr_int[23]' ]]  && break
        done
        [[ $lang = $2 || $lang = 3 ]] && echo $sub >> /tmp/target
done
exit 0

This saves an awk and a grep for each iteration of the loop. You are still launching a scpshow process for each iteration of the loop. Unless scpshow can work with multiple subs at once, this can't be avoided.