Gawk Script in Windows batch file - Help

10000springs · October 4, 2010, 11:14am

Good morning all. I have been running into a problem running a simple gawk script that selects every third line from an input file and writes it to an output file.

gawk "NR%3==0" FileIn > FileOut

I am attempting to run this command from a batch file at the command line. I have several hundred files to perform this operation on and it would sure be handy to be able to just work up a quick batch file and let it fly.

The script works fine from the command line. Very slow though since I must do one file at a time and change the file name between each run. Tedious to say the least and I have been up all night now.

The problem that I am having appears to be related to the %3 qualifier in the script. When run from a batch file I do not get an error message. I get an empty file and the screen output at the command line shows the code interpreted as

gawk "NR==0" FileIn > FileOut

I have tried backslashing the %, enclosing it in double-quotes, enclosing each character in double-quotes, double-quotes and double backslashes, etc. All the combinations that my tired mind could conjure. No joy.

I am beginning to think that gawk will not pass this particular character to the command line from a Windows batch file. I have looked around various forums, read the books that I have, and can find nothing to guide me.

I would be thrilled if someone had a solution to the problem of grabbing every third line from a file and dumping to text output. Perl or gawk preferred. Other options will be considered.

Thanks for looking. 10000Springs (BC)

durden_tyler · October 4, 2010, 12:25pm

Here's a short Perl program that does that -

#perl -w
$pattern = "file*";                         # process only those files that match pattern "file*"
while (defined ($in = glob($pattern))) {
  ($out = $in) =~ s/\.in$/.out/;            # read from "xyz.in" and write to "xyz.out"
  open (IN, "<", $in) or die "Can't open $in for reading: $!";
  open (OUT,">>", $out) or die "Can't open $out for writing: $!";
  while (<IN>) {
    print OUT $_ if $.%3 == 0;
  }
  close (IN) or die "Can't close $in: $!";  # good idea to do some housekeeping
  close (OUT) or die "Can't close $out: $!";
}

Testcase -

C:\>
C:\>
C:\>rem Display the contents of input files "file*.in"
 
C:\>type file1.in
this is line   1 in file1.in ...
this is line   2 in file1.in ...
this is line   3 in file1.in ...
this is line   4 in file1.in ...
this is line   5 in file1.in ...
this is line   6 in file1.in ...
this is line   7 in file1.in ...
this is line   8 in file1.in ...
this is line   9 in file1.in ...
this is line  10 in file1.in ...
 
C:\>type file2.in
this is line   1 in file2.in ...
this is line   2 in file2.in ...
this is line   3 in file2.in ...
this is line   4 in file2.in ...
this is line   5 in file2.in ...
this is line   6 in file2.in ...
this is line   7 in file2.in ...
this is line   8 in file2.in ...
this is line   9 in file2.in ...
this is line  10 in file2.in ...
 
C:\>type file3.in
this is line   1 in file3.in ...
this is line   2 in file3.in ...
this is line   3 in file3.in ...
this is line   4 in file3.in ...
this is line   5 in file3.in ...
this is line   6 in file3.in ...
this is line   7 in file3.in ...
this is line   8 in file3.in ...
this is line   9 in file3.in ...
this is line  10 in file3.in ...
 
C:\>rem Now display the content of the Perl program
 
C:\>type process_files.pl
#perl -w
$pattern = "file*";                         # process only those files that match pattern "file*"
while (defined ($in = glob($pattern))) {
  ($out = $in) =~ s/\.in$/.out/;            # read from "xyz.in" and write to "xyz.out"
  open (IN, "<", $in) or die "Can't open $in for reading: $!";
  open (OUT,">>", $out) or die "Can't open $out for writing: $!";
  while (<IN>) {
    print OUT $_ if $.%3 == 0;
  }
  close (IN) or die "Can't close $in: $!";  # good idea to do some housekeeping
  close (OUT) or die "Can't close $out: $!";
}
 
C:\>
C:\>rem Run the Perl program
 
C:\>perl process_files.pl
 
C:\>
C:\>rem Now check the output files
 
C:\>type file1.out
this is line   3 in file1.in ...
this is line   6 in file1.in ...
this is line   9 in file1.in ...
 
C:\>type file2.out
this is line   3 in file2.in ...
this is line   6 in file2.in ...
this is line   9 in file2.in ...
 
C:\>type file3.out
this is line   3 in file3.in ...
this is line   6 in file3.in ...
this is line   9 in file3.in ...
 
C:\>
C:\>

HTH,
tyler_durden

Franklin52 · October 4, 2010, 12:46pm

10000springs:

Good morning all. I have been running into a problem running a simple gawk script that selects every third line from an input file and writes it to an output file.
gawk "NR%3==0" FileIn > FileOut
I am attempting to run this command from a batch file at the command line. I have several hundred files to perform this operation on and it would sure be handy to be able to just work up a quick batch file and let it fly.

The script works fine from the command line. Very slow though since I must do one file at a time and change the file name between each run. Tedious to say the least and I have been up all night now.

The problem that I am having appears to be related to the %3 qualifier in the script. When run from a batch file I do not get an error message. I get an empty file and the screen output at the command line shows the code interpreted as
gawk "NR==0" FileIn > FileOut
I have tried backslashing the %, enclosing it in double-quotes, enclosing each character in double-quotes, double-quotes and double backslashes, etc. All the combinations that my tired mind could conjure. No joy.

I am beginning to think that gawk will not pass this particular character to the command line from a Windows batch file. I have looked around various forums, read the books that I have, and can find nothing to guide me.

I would be thrilled if someone had a solution to the problem of grabbing every third line from a file and dumping to text output. Perl or gawk preferred. Other options will be considered.

Thanks for looking. 10000Springs (BC)

You could try to escape the "%":

gawk "NR\%3==0"

methyl · October 4, 2010, 1:05pm

Isn't the Escape in MSDOS Batch %% ?

Have you tried:

gawk "NR%%3==0" FileIn > FileOut

durden_tyler · October 4, 2010, 1:55pm

I noticed that Cygwin's gawk works as expected on my system -

C:\>
C:\>type file1.in
this is line   1 in file1.in ...
this is line   2 in file1.in ...
this is line   3 in file1.in ...
this is line   4 in file1.in ...
this is line   5 in file1.in ...
this is line   6 in file1.in ...
this is line   7 in file1.in ...
this is line   8 in file1.in ...
this is line   9 in file1.in ...
this is line  10 in file1.in ...
 
C:\>
C:\>c:\cygwin\bin\gawk "NR%3==0" file1.in
this is line   3 in file1.in ...
this is line   6 in file1.in ...
this is line   9 in file1.in ...
 
C:\>
C:\>c:\cygwin\bin\gawk 'NR%3==0' file1.in
this is line   3 in file1.in ...
this is line   6 in file1.in ...
this is line   9 in file1.in ...
 
C:\>
C:\>

tyler_durden

10000springs · October 4, 2010, 2:29pm

My problem is solved in more ways than one. Thanks to all who viewed the thread and especially those who posted their useful solutions. I learned a bit in the process and have improved my turn-around on the processing of these files considerably.

As Methyl and Franklin52 both pointed out and as I suspected, I was not escaping the % correctly in the batch flow. I also probably contributed to confusion by using the term "backslashing" in place of "escaping" in my original post. The actual term escaped me since I have been up for 26 hours now and I am trying to get back in tune with programming after a long dry spell.

As I noted in the first post the script Would run from the command line in Windows (XP).

I can also run the same script with no problems in cygwin, as tyler_durden pointed out.

It was only when I tried to use it in a batch file to process multiple files that I had a problem. I could not remember how to escape the % character in a batch file. The gawk solution put forth by Methyl worked perfectly in my batch file. I appreciate that immensely. My batch file has been modified as in the example posted by Methyl with the %% in front of the decimation interval.

I appreciate too the code example (perl) posted by tyler_durden above. Using that example, I was able to modify a short perl program that I put together a few weeks ago (with assistance from some of the same people on this site) so that I can perform this file reformat and decimation in one step instead of two.

I'm almost ready to stop and smell the roses.

Thanks for everything. 10000Springs (BC)

Leo_Gutierrez · January 12, 2011, 2:16pm

@echo off
for /f "tokens=2 delims=:" %%_ in (' 
	^(
		echo 3^,3l
		echo e
	^) ^| edlin /b FILEIN.TXT
') do (
	echo %%_ > OUT.TXT
	)
exit /b 0

C:\>type FILEIN.TXT
Dan
Brown
El
Simbolo
Perdido

C:\>code.bat

C:\>type OUT.TXT
 El

C:\>