How to make AWK process an input file many many times?

kevintse · May 14, 2010, 3:39am

By "many many times" I mean the times the input file is to be processed is unknown beforehand, it will be known when awk finishes processing the input file for the first time.

So my question is: how to start over again from the first record of the input file when AWK finishes processing the last record(before reaching "END")?

zaxxon · May 14, 2010, 4:04am

Can you give a practical example to describe for what this is needed?
Do you need a demon that processes a log file continously for example or is it just for theoretical purpose?
Else you could pipe a tail -f into awk.

A tad more description needed.

kevintse · May 14, 2010, 4:16am

OK.
Here's an example: I have a file that has 30 records. for the first time this file is processed, I will do "group by" for the records, it might remain only 10 records after that, and I will save the 10 records to an array. And then I want to process the original file(30 records) 10 times (once for each record that was saved in the first processing, each record is something like a parameter here.)

zaxxon · May 14, 2010, 4:49am

Still somewhat abstract - to answer abstract you can just fill 1 array with your matching pattern lines and another array with all lines. In the END block you can walk through both arrays onto each other by as many elements that are in the 1st array.

An example input file with an example of expected output would help to help.

guruprasadpr · May 14, 2010, 5:08am

Hi
Do you mean to say

awk '.......' file file file file

Instead of writing the file n number of times, you want a way to control it?

Guru.

radoulov · May 14, 2010, 5:34am

If you need to process the same data multiple times and the data is not that large (GBs), you'd better read it into an array (or multiple arrays) and process it in memory after reading the input.

If the data size does not fit into memory, you can manipulate the ARGV array (just like I showed you in one of your previous posts).

kevintse · May 14, 2010, 5:38am

Hi, Guru
I would take zaxxon's suggestion and read the file once and save the records to an array and process them as many times as needed.
or just put "ARGV[ARGC++] = ARGV[ARGC-1]"(thanks for radoulov's code) at the end of each processing.

---------- Post updated at 04:38 AM ---------- Previous update was at 04:35 AM ----------

Thank you.

radoulov · May 14, 2010, 5:50am

You could manipulate the ARGV array in the BEGIN block as you like:

awk -vl=<times> 'BEGIN {
  f = ARGV[ARGC-1]
  while (++i <= l)
    ARGV[ARGC++] = f
  }
1' file

For example:

% echo in file >file

To read the input file 3 times:

% awk -vl=3 'BEGIN {
  f = ARGV[ARGC-1]
  while (++i< l)
    ARGV[ARGC++] = f
  }
1' file      
in file
in file
in file

To read it 10 times:

% awk -vl=10 'BEGIN {
  f = ARGV[ARGC-1]
  while (++i< l)
    ARGV[ARGC++] = f
  }
1' file                  
in file
in file
in file
in file
in file
in file
in file
in file
in file
in file

Note that the -vvar syntax (without a space between -v and the identifier) is a GNU awk extension. With other awk implementations you should use -v var (with a space).