Good morning all. I have been running into a problem running a simple gawk script that selects every third line from an input file and writes it to an output file.
gawk "NR%3==0" FileIn > FileOut
I am attempting to run this command from a batch file at the command line. I have several hundred files to perform this operation on and it would sure be handy to be able to just work up a quick batch file and let it fly.
The script works fine from the command line. Very slow though since I must do one file at a time and change the file name between each run. Tedious to say the least and I have been up all night now.
The problem that I am having appears to be related to the %3 qualifier in the script. When run from a batch file I do not get an error message. I get an empty file and the screen output at the command line shows the code interpreted as
gawk "NR==0" FileIn > FileOut
I have tried backslashing the %, enclosing it in double-quotes, enclosing each character in double-quotes, double-quotes and double backslashes, etc. All the combinations that my tired mind could conjure. No joy.
I am beginning to think that gawk will not pass this particular character to the command line from a Windows batch file. I have looked around various forums, read the books that I have, and can find nothing to guide me.
I would be thrilled if someone had a solution to the problem of grabbing every third line from a file and dumping to text output. Perl or gawk preferred. Other options will be considered.
#perl -w
$pattern = "file*"; # process only those files that match pattern "file*"
while (defined ($in = glob($pattern))) {
($out = $in) =~ s/\.in$/.out/; # read from "xyz.in" and write to "xyz.out"
open (IN, "<", $in) or die "Can't open $in for reading: $!";
open (OUT,">>", $out) or die "Can't open $out for writing: $!";
while (<IN>) {
print OUT $_ if $.%3 == 0;
}
close (IN) or die "Can't close $in: $!"; # good idea to do some housekeeping
close (OUT) or die "Can't close $out: $!";
}
Testcase -
C:\>
C:\>
C:\>rem Display the contents of input files "file*.in"
C:\>type file1.in
this is line 1 in file1.in ...
this is line 2 in file1.in ...
this is line 3 in file1.in ...
this is line 4 in file1.in ...
this is line 5 in file1.in ...
this is line 6 in file1.in ...
this is line 7 in file1.in ...
this is line 8 in file1.in ...
this is line 9 in file1.in ...
this is line 10 in file1.in ...
C:\>type file2.in
this is line 1 in file2.in ...
this is line 2 in file2.in ...
this is line 3 in file2.in ...
this is line 4 in file2.in ...
this is line 5 in file2.in ...
this is line 6 in file2.in ...
this is line 7 in file2.in ...
this is line 8 in file2.in ...
this is line 9 in file2.in ...
this is line 10 in file2.in ...
C:\>type file3.in
this is line 1 in file3.in ...
this is line 2 in file3.in ...
this is line 3 in file3.in ...
this is line 4 in file3.in ...
this is line 5 in file3.in ...
this is line 6 in file3.in ...
this is line 7 in file3.in ...
this is line 8 in file3.in ...
this is line 9 in file3.in ...
this is line 10 in file3.in ...
C:\>rem Now display the content of the Perl program
C:\>type process_files.pl
#perl -w
$pattern = "file*"; # process only those files that match pattern "file*"
while (defined ($in = glob($pattern))) {
($out = $in) =~ s/\.in$/.out/; # read from "xyz.in" and write to "xyz.out"
open (IN, "<", $in) or die "Can't open $in for reading: $!";
open (OUT,">>", $out) or die "Can't open $out for writing: $!";
while (<IN>) {
print OUT $_ if $.%3 == 0;
}
close (IN) or die "Can't close $in: $!"; # good idea to do some housekeeping
close (OUT) or die "Can't close $out: $!";
}
C:\>
C:\>rem Run the Perl program
C:\>perl process_files.pl
C:\>
C:\>rem Now check the output files
C:\>type file1.out
this is line 3 in file1.in ...
this is line 6 in file1.in ...
this is line 9 in file1.in ...
C:\>type file2.out
this is line 3 in file2.in ...
this is line 6 in file2.in ...
this is line 9 in file2.in ...
C:\>type file3.out
this is line 3 in file3.in ...
this is line 6 in file3.in ...
this is line 9 in file3.in ...
C:\>
C:\>
I noticed that Cygwin's gawk works as expected on my system -
C:\>
C:\>type file1.in
this is line 1 in file1.in ...
this is line 2 in file1.in ...
this is line 3 in file1.in ...
this is line 4 in file1.in ...
this is line 5 in file1.in ...
this is line 6 in file1.in ...
this is line 7 in file1.in ...
this is line 8 in file1.in ...
this is line 9 in file1.in ...
this is line 10 in file1.in ...
C:\>
C:\>c:\cygwin\bin\gawk "NR%3==0" file1.in
this is line 3 in file1.in ...
this is line 6 in file1.in ...
this is line 9 in file1.in ...
C:\>
C:\>c:\cygwin\bin\gawk 'NR%3==0' file1.in
this is line 3 in file1.in ...
this is line 6 in file1.in ...
this is line 9 in file1.in ...
C:\>
C:\>
My problem is solved in more ways than one. Thanks to all who viewed the thread and especially those who posted their useful solutions. I learned a bit in the process and have improved my turn-around on the processing of these files considerably.
As Methyl and Franklin52 both pointed out and as I suspected, I was not escaping the % correctly in the batch flow. I also probably contributed to confusion by using the term "backslashing" in place of "escaping" in my original post. The actual term escaped me since I have been up for 26 hours now and I am trying to get back in tune with programming after a long dry spell.
As I noted in the first post the script Would run from the command line in Windows (XP).
I can also run the same script with no problems in cygwin, as tyler_durden pointed out.
It was only when I tried to use it in a batch file to process multiple files that I had a problem. I could not remember how to escape the % character in a batch file. The gawk solution put forth by Methyl worked perfectly in my batch file. I appreciate that immensely. My batch file has been modified as in the example posted by Methyl with the %% in front of the decimation interval.
I appreciate too the code example (perl) posted by tyler_durden above. Using that example, I was able to modify a short perl program that I put together a few weeks ago (with assistance from some of the same people on this site) so that I can perform this file reformat and decimation in one step instead of two.