Efficient population of array from text file

carlr · June 27, 2012, 4:25pm

Hi,

I am trying to populate an array with data from a text file. I have a working method using awk but it is too slow and inefficent. See below.

The text file has 70,000 lines. As awk is a line editor it reads each line of the file until it gets to the required line and then processes it. Then for the next loop it starts at the first line and then reads each line of the file until it gets to the required line and then processes it. This is very inefficent when you get towards the bottom of a large file.

A want to efficently write the 70,000 lines of a text file into an array where the each line of the file contains 5 numbers and is comma seperated. The array index should be equal to the first number. The value of the array for each index can be equal to either the whole line or the last four numbers of the line.

The method does not have to use awk, so long as it is a bash script.
I can't figure out how to use awk to read the data to the array as it reads through the file on the first read through.

for ((i=1; i<=70000; i++)) do                        
  line_num=$i
  ARRAY_3D_ELSET[$i]=$(awk -F, -v var="$line_num" 'NR==var {print $0}' 3D_ELSET.tmp)     
done

Chubler_XL · June 27, 2012, 5:13pm

How about this

eval ARRAY_3D_ELSET=( $(awk -F, '{print "["$1"]=\""$0"\" "}' 3D_ELSET.tmp) )

carlr · June 27, 2012, 7:37pm

Thanks the above works very nicley. But i now have another issue.

I need to efficiently scan through a text file of 70,000 lines and populate an array. This time it is more complex. Each line has 5 numbers, and is comma separated; EL_Num, Node_A, Node_B, Node_C, Node_D. The values for these are all integers.

I want to populate the array with the index equal to the Node number, and the string equal to all the El_num which contain that node.

For example given the below lines out of the file:

EL_Num, Node_A, Node_B, Node_C, Node_D
      5414,      5249,      5018,      5217,      5113
      5415,      5018,       5035,      5300,      5201
      5416,      5345,      5013,      5018,        5245

It can be seen that all el_num contain node 5018.

Therefore ARRAY[5108] should equal 5014, 5015, 5016.

Like wise ARRAY[5249] should equal 5014

I have a method below but it is very slow, how can this be done efficiently?

Last_line_3D=$(awk -F, 'END {print NR}' 3D_ELSET.tmp)    
Num_Fields=$(awk -F, 'NR==1 {print NF}' 3D_ELSET.tmp)


for ((i=1; i<=$Last_line_3D; i++)) do            
for ((j=2; j<=$Num_Fields; j++)) do

EL_num=$(awk -F, -v variable="$i" 'NR==variable {print $1}' 3D_ELSETTEST.tmp)
ND_num=$(awk -F, -v vari="$i" -v field="$j" 'NR==vari {print $field}' 3D_ELSETTEST.tmp)


Orig_string=${ARRAY[ND_num]}                # Set variable as original string in array

unset ARRAY[ND_num]                            # Remove current string in the array

New_string="$Orig_string $EL_num"            # Set variable as original string in array plus the new element number

ARRAY[ND_num]=$New_string                    # Place the new string into the array

done
done

Chubler_XL · June 27, 2012, 9:44pm

Try:

eval ARRAY=( $(awk -F, '{for(i=2;i<=NF;i++)v[$i]=(($i in v)?v[$i]" ":"")$1}
             END { for(i in v) print "["i"]=\""v"\""}' 3D_ELSET.tmp) )