using array inside awk

Hi All,

I have the following code sequence for reading some bulk file and moving the content to two different arrays.

while read data
do
THREEG_PATTERN=`echo $data | egrep "3G"`
if [ "$data" == "$THREEG_PATTERN" ]
then
NEW_THREEG_PATTERN=`echo $THREEG_PATTERN | cut -d " " -f2`
THREEG_ARRAY[$IDX]=$NEW_THREEG_PATTERN
IDX=$(($IDX+1))
else
SP_ARRAY[$IDX_SP]=$data
IDX_SP=$(($IDX_SP+1))
fi

done < $_pat

This code is working perfectly but this is taking long time because of big file size.... How can i improve the performance of this process by using awk or other mechanism? Or how can i convert this entire code to awk? Looking forward ur reply

Thanks in advance
Subin

If you post an example of your input and the desired output (or whatever you want to do with it), it would be easier.

Yes, the old

while read rec; do 
    # process stuff
done < file

construct is notoriously slow and thus should only be used
on small to moderately big files.
I would go for Perl.
Without having looked too carefully at your processing,
and without even having tried for correct syntax
some Perl statement like this might do?

$ perl -ane 'push @ary,$F[1] if /3G/;END{$"="\n";print"@ary"}' file_to_read_from

Hi ,
Thanks for the reply

My input file will be like this

3G ^Alcatel-OT-S920\/
3G ^Amoi.*A500
3G SPV.*C200
3G ^Levis_Original_3G
3G ASUS-P735
3G ^SIE-E71F\/
3G ^SIE-E71\/
3G ^SIE-ELF1
^Sagem-my401C
^SAGEM-my411C-orange
^SAGEM-my511X-orange
^SEC-SGHC520
^SEC-SGHD520\/
^SEC-SGHE570\/
^SEC-SGHF210
^SEC-SGHF300\/
^SAMSUNG-SGH-F490
^SEC-SGHP310
^SEC-SGHP520
^SAMSUNG-SGH-D500\/
^SAMSUNG-SGH-D600\/
^SAMSUNG-SGH-D900
3G SPV.*M650
3G ^HTC-P4550
3G ^HTC-P4550-orange\/
3G ^HTC-P5500-orange\/
3G SPV.*C100
3G SPV.*C600
3G SPV.*C700
3G SPV.*E65

And i need to stroe these things in to 2 differnt arrays ...

OK,
and once you have built the arrays? What's the next task?

After that some other functions will read the values from these two arrays and compare the values which they have ....

Thanks and Regards,
Subin

It's quite easy to build those arrays with Awk and Perl, but you should continue to use those languages for the next task:

mint% awk 'END {
  while (++n <= i)
    printf "tg_a, element %d: %s\n", n, tg_a[n]
  while (++m <= j)
    printf "sp_a, element %d: %s\n", m, sp_a[m]
    }
{
  if (/^3G/)
    tg_a[++i] = $2
  else
    sp_a[++j] = $0
  }' file
tg_a, element 1: ^Alcatel-OT-S920\/
tg_a, element 2: ^Amoi.*A500
tg_a, element 3: SPV.*C200
tg_a, element 4: ^Levis_Original_3G
tg_a, element 5: ASUS-P735
tg_a, element 6: ^SIE-E71F\/
tg_a, element 7: ^SIE-E71\/
tg_a, element 8: ^SIE-ELF1
tg_a, element 9: SPV.*M650
tg_a, element 10: ^HTC-P4550
tg_a, element 11: ^HTC-P4550-orange\/
tg_a, element 12: ^HTC-P5500-orange\/
tg_a, element 13: SPV.*C100
tg_a, element 14: SPV.*C600
tg_a, element 15: SPV.*C700
tg_a, element 16: SPV.*E65
sp_a, element 1: ^Sagem-my401C
sp_a, element 2: ^SAGEM-my411C-orange
sp_a, element 3: ^SAGEM-my511X-orange
sp_a, element 4: ^SEC-SGHC520
sp_a, element 5: ^SEC-SGHD520\/
sp_a, element 6: ^SEC-SGHE570\/
sp_a, element 7: ^SEC-SGHF210
sp_a, element 8: ^SEC-SGHF300\/
sp_a, element 9: ^SAMSUNG-SGH-F490
sp_a, element 10: ^SEC-SGHP310
sp_a, element 11: ^SEC-SGHP520
sp_a, element 12: ^SAMSUNG-SGH-D500\/
sp_a, element 13: ^SAMSUNG-SGH-D600\/
sp_a, element 14: ^SAMSUNG-SGH-D900

Its working fast for me.. thnks...
But i am not getting the values of these arrays outside the awk ...
How i will get this value ?

Thnks in advance

That's why I was asking what is the next task: you can get the values outside,
but I'm sure you can continue with the next task using Awk. Could you describe the next task?

Hi Radoluv,

Thnks for ur valuable replies

My next tasks are

getting the IP and USERAGENT value from the command line argument
call function isInFua ( which is internally calling some othe functions)
FInally i need the Return value (RET_VALUE) like 3G or Smartphone or WAP or WEB...

I am attaching the code with ur changes

#!/bin/sh

IPWeb=( 10.66.24.1 10.66.24.3 10.66.24.4 10.66.24.23 10.66.24.35 10.66.24.34 10.66.24.36 10.66.24.38 10.66.17.151 10.66.17.152 10.66.17.153 10.66.17.
154 10.66.17.155 10.66.17.156 10.66.17.147 10.66.17.148 10.66.17.149 10.66.17.150 )

IPMobile=( 10.163.118.159 10.163.118.160 10.163.118.161 10.163.118.162 10.163.118.163 10.163.118.164 10.163.118.165 10.163.118.166 10.163.118.167 10.163.118.
168 10.163.118.169 10.163.118.170 10.163.118.171 10.163.118.172 10.66.24.40 10.66.24.17 10.66.24.19 10.66.24.37 10.66.17.131 10.66.17.132 10.66.17.133 10.66.
17.134 10.66.17.135 10.66.17.136 10.66.17.137 10.66.17.138 10.66.17.139 10.66.17.144 10.66.17.145 10.66.17.146 )

if [ ! -e $_ficParam ] # e means file exists
then
echo $_ficParam file is misssing
exit
fi

if [ "$_pat" == "" ]
then
_pat=intermediatePattern.txt
fi

if [ ! -e $_pat ] # e means file exists
then
echo "intermediatePattern.txt file is not accessibile"
fi

echo _pat : $_pat
IDX=0
IDX_SP=0

awk 'END {
while (++n <= i)
printf "tg_a, element %d: %s\n", n, THREEG_ARRAY[n]
while (++m <= j)
printf "sp_a, element %d: %s\n", m, SP_ARRAY[m]
}
{
if (/^3G/)
THREEG_ARRAY[++i] = $1
else
SP_ARRAY[++j] = $0
}' $_pat

    SIZE\_THREEG_ARRAY=\`echo $\{\#THREEG_ARRAY[*]\}\`
    SIZE\_SP_ARRAY=\`echo $\{\#SP_ARRAY[*]\}\`

ALL_ARGS=$* # argument from command line

IP=`echo $ALL_ARGS | awk '{print $1}'` # getting value for IP address
USER_AGENT=$(echo $ALL_ARGS | cut -d'"' -f6) # getting value for USER AGENT

var=0
RET_VALUE=""

function isInFua
{
echo VAR --- $var

    isSmartPhone    \# calling function for Smart phone
    if [ $var == 1 ]
    then
            RET_VALUE=SmartPhone
            return
    fi

    isThreeG        \# calling function for 3G
    if [ $var == 1 ]
    then
            RET_VALUE=3G
            return
    fi

    isFromWAP    \# calling function for WAP
    if [ $var == 1 ]
    then
            RET_VALUE=WAP
            return
    fi

    isFromWEB  \# \# calling function for WEB
    if [ $var == 1 ]
    then
            RET_VALUE=WEB
            return
    fi

}

#------------------------------------------------------------------------------#
# Test 3G and 2.5G on a user-agent functions #
#------------------------------------------------------------------------------#

function isSmartPhone
{
for ((i = 0 ; i < $SIZE_SP_ARRAY ; i++))
do
USER_AGENT_VALUE=`echo $USER_AGENT | grep "${SP_ARRAY[$i]}"`
if [ $? = 0 ]
then
var=1
return
fi
done

}

function isThreeG
{
for ((i = 0 ; i < $SIZE_THREEG_ARRAY ; i++))
do
USER_AGENT_VALUE=`echo $USER_AGENT | grep "${THREEG_ARRAY[$i]}"`
if [ $? = 0 ]
then
var=1
return
fi
done

}

function isFromWAP
{
for ((i = 0 ; i < ${#IPMobile[*]} ; i++))
do
IP_MOBILE_VALUE=`echo $IP | grep "${IPMobile[i]}"`
if [ $? = 0 ]
then
var=1
return
fi
done

}

function isFromWEB
{
for ((i = 0 ; i < ${#IPWeb[*]} ; i++))
do
IP_MOBILE_VALUE=`echo $IP | grep "${IPWeb[i]}"`
if [ $? = 0 ]
then
var=1
return
fi
done

}
isInFua
echo RET_VALUE = $RET_VALUE

Can you please go through with this ??
Thanks for the help...

Could you post a sample invocation of the script (parameters included)?

My script(analyseTrafic.sh) is calling by another script like this

temp="10.48.81.109 - - [18/Jul/2008:15:20:12 +0200] \"GET /cacti/settings.php?tab=general HTTP/1.1\" 200 14457 \"http://10.58.198.153/cacti/settings.php?tab=path\\" \"SPV.*E65 Profile/MIDP-2.0 Configuration/CLDC-1.0 UP.Link/6.2.3.15.0\""

./analyseTrafic.sh $temp

The IP Value will be 10.48.81.109 and USER_AGENT is "SPV.*E65 Profile/MIDP-2.0 Configuration/CLDC-1.0 UP.Link/6.2.3.15.0\" inside analyseTrafic.sh

A few suggestions:
I see your're using bash syntax and you're invoking the script as /bin/sh
so I assume you're on Linux and probably you have a recent bash version.
If that's the case, you could use a different approach for the look up:

  1. Extract the USER_AGENT from the argument (or the first argument, if you
    invoke the script like this: ./analyseTrafic.sh "$temp"):

[assuming GNU grep and recent bash on Linux]

extracting from all arguments:

USER_AGENT="$(sed -r 's|([^"]*"){4} "([^/ ]+).*|\2|'<<<"$@")"

extracting from the first one:

USER_AGENT="$(sed -r 's|([^"]*"){4} "([^/ ]+).*|\2|'<<<"$1")"
  1. Look it up in the file ("$_pat") with grep:
_user_agent="$(fgrep "$USER_AGENT" "$_pat")"

And then use the [[ ]] bash operator to test for the 3G string at the beginning.