Saving file content in arrays using AWK

atikan · February 9, 2011, 3:25am

Hi, im new to shell scripting. i have a query for which i have searched your forums but coulndt get what i need.

i have a file that has two records of exactly the same length and format and they are comma seperated. i need to save the first and the second columns of the input file to 2 different array variables, using AWK
i am using solaris10. input file is like below

001,a
002,b
003,c
004,d
005,e
006,f
007,g

i know its kinda simple just couldnt find a way out:o

Franklin52 · February 9, 2011, 3:30am

Something like this?

awk -F, '{arr1[++c]=$1; arr2[c]=$2} {# other code....}' file

atikan · February 9, 2011, 3:51am

thanx franklin actually the thing that im working on is to, first, get the input from a file which contains a list first and last number of a range and is comma seperated (say inp.txt) and store them in arrays.

$ more inp.txt
923053400100,923053449989
923072600118,923072622197
923076470027,923076493135

Secondly, run a code(already designed) NR number of times in which it should start to process the first element of both arrays upto the last element of both arrays. i mean elements of both arrays are the 2 inputs for that code.

glad if u can help me in this your support is much appreciated

Franklin52 · February 9, 2011, 4:42am

Please include as many details as possible about the problems you encountered and your (already design) code.

atikan · February 9, 2011, 5:39am

Requirement: To make a script that could extract a few numbers falling in a closed range from a dump file that has the complete list of numbers.
There should be two inputs to this script.

the dump file (complete list)
the file containing closed ranges (the one that i've mentioned in my previous post) [inp.txt]

Now the script should read from inp.txt and construct an array for the 1st field(start number of range) and another one for the 2nd field(last number of range) and search for this range of numbers from the dump.txt file(complete list of numbers). And save the numbers qualifying in the closed ranges into an output file.

I've made a code that will do the above mentioned searching. But the problem is that i need to feed in the start and end numbers of each range one by one. i want to modify my code upto the next level where it only asks for the file that contains the range lists and keeps running my code until all the records of the range file(inp.txt) have been processed

below is my code

echo Please input start of range
read min
echo Please input end of range
read max
echo Please input dump file
read file

nawk -v min=$min -v max=$max '$0 >= min && $0 <= max' "$file" >> out.txt

pls tell if there is anything more that u need. thanx

Franklin52 · February 9, 2011, 6:13am

Should be something like this:

awk -F, 'NR==FNR {arr1[++c]=$1; arr2[c]=$2}	# Fill arrays
{
  for(i=1;i<=c;i++) {		# Loop through arrays
    if($0 >= arr1 && $0 <= arr2) {
      print
      break
    }
  }
}' inp.txt dumpfile > out.txt

atikan · February 9, 2011, 7:21am

there was ) missing in your if statement. code is running fine after inserting it. but the out.txt is empty.

Franklin52 · February 9, 2011, 8:18am

Can you post some lines of your dumpfile?

atikan · February 9, 2011, 9:43am

bash-2.05# more dumpfile | grep 9230534001
923053400100
923053400102
923053400103
923053400104
923053400105
923053400106
923053400107
923053400108
923053400110
923053400112
923053400114
923053400115
923053400116
923053400117

Franklin52 · February 9, 2011, 10:03am

Sorry, I forgot a next statement on the first line. The code should be:

awk -F, 'NR==FNR {arr1[++c]=$1; arr2[c]=$2; next}	# Fill arrays
{
  for(i=1;i<=c;i++) {		# Loop through arrays
    if($0 >= arr1 && $0 <= arr2) {
      print
      break
    }
  }
}' inp.txt dumpfile > out.txt

This is what I get:

$ cat inp.txt
923053400100,923053449989
923072600118,923072622197
923076470027,923076493135
$
$ cat dumpfile
1231231331
12123133
923053400100
923053400102
923053400103
923053400104
923053400105
923053400106
923053400107
923053400108
923053400110
923053400112
923053400114
923053400115
923053400116
923053400117
12131313
211231231
99999999999
$
$ awk -F, 'NR==FNR {arr1[++c]=$1; arr2[c]=$2; next}	# Fill arrays
{
  for(i=1;i<=c;i++) {		# Loop through arrays
    if($0 >= arr1 && $0 <= arr2) {
      print
      break
    }
  }
}' inp.txt dumpfile
923053400100
923053400102
923053400103
923053400104
923053400105
923053400106
923053400107
923053400108
923053400110
923053400112
923053400114
923053400115
923053400116
923053400117
$

atikan · February 10, 2011, 8:19am

wow that was genious thanks alot. can u tell me why did u use arr1[++c] instead of arr1[c] as u did for arr2[c]; and the use of next statement inside the awk body(i mean what will it do)

Franklin52 · February 10, 2011, 9:05am

c is used as an index of the array a and ++c increases the value of c with 1.

arr1[1] keeps the 1e field and arr2[1] the 2nd field of the 1e line
arr1[2] keeps the 1e field and arr2[1] the 2nd field of the 2e line and so forth...

The condition NR==FNR is true if we read the first file. With the next command we read the next line of the 1e file because the commands after the next command is for processing the 2e file.

If this is new to you, you should have a read of:

Awk - A Tutorial and Introduction - by Bruce Barnett