Help with Efficient Looping

joshiamit · December 26, 2011, 5:43am

Hello guys

My requirement is to read a file with parent-child relationship
we need to iterate through each row to find its latest child.
for eg. parent child
ABC PQR
PQR DEF
DEF XYZ

Expected Output
ABC XYZ
PQR XYZ
DEF XYZ

Script Logic :
read parent from file
seach child =parent in file if match found replace child with parent
else
go to next line
I have created a bash script to achive this and its working fine
My issue is I need to process a file with more than 2 million records.
My script is taking one and half hrs for 25000 records
Can anyone suggest more effecient approach

vivek_d_r · December 26, 2011, 6:14am

here is your code

i=0
while read line
do
        if [ $i -eq 0 ]; then
                first1=$( echo $line | awk -F' ' '{print $1}' )
                i=1
        fi
        tmp1=$( echo $line | awk -F' ' '{print $1}' )
        if [[ "$second" == "$tmp1" ]] ; then
                tmp2=$( echo $line | awk -F' ' '{print $2}' )
        fi
        first=$( echo $line | awk -F' ' '{print $1}' )
        second=$( echo $line | awk -F' ' '{print $2}' )
done < tmp.sql
while read line
do
        first=$( echo $line | awk -F' ' '{print $1}' )
        echo "$first    $tmp2"
done < tmp.sql

this is acutally time consuming.. wait till some one replies with one line code... the code which i pasted above is inefficient...

Scrutinizer · December 26, 2011, 6:55am

See if this works faster:

awk '{for(i in A)if($1==A)A=$2; A[$1]=$2} END{for(i in A) print i, A}' infile

joshiamit · December 26, 2011, 7:53am

Ok Will try this one and let you know only thing is I am reading from fixed width file I'm not sure abt how this will work with awk ...

if I will try one more option like

awk '{ while read in A ...and similar logic }'infile

Please let me know if I am on correct path

jgt · December 26, 2011, 8:34am

If you sort the file into reverse order then the latest child is located first.

nl input |sort -r -n