awk: too many output files created from while loop

dodgerfan78 · October 8, 2012, 8:54pm

I am using awk to read lines from a CSV file then put data into other files. These other files are named using the value of a certain column. Column 7 is a name such as "att" or "charter" . I want to end up with file names with the value of column 7 appended to them, like this:

stockton-1-migrate-att.cfg
stockton-2-migrate-att.cfg
stockton-1-migrate-charter.cfg
stockton-2-migrate-charter.cfg

I am getting these files alright, but I am ending up with a couple extra files as if column 7 was not populated.

stockton-migrate-1-.cfg
stockton-migrate-2-.cfg

The last part of the file name after the 3rd "-" comes from the $7 variable which I have as part of the output file name. I don't understand why my script is creating these files that end in "-.cfg" since $7 is always populated with data. Also, I instruct the script not to run through the while loop when $4 equals "d" and $7 equals "none".

Here is a snippet of my CSV file, notice that line 17 has "d" in $4 and "none" in $7.

1,k,,,Keystone, RENO DOWNTOWN,charter1,,
2,k,,,Keystone, SPARKS NORTH,charter1,,
7,k,,,Keystone, WASHOE MED CENTER,att1,,
17,k,,d,Keystone, RENO BRISAS,none,,

Here is my code. Is there something I am doing wrong? I am getting the data I need but I am ending up with extra data that I don't want which makes me think there is a better way of doing this.

BEGIN {FS = ",";read_header=0}
# Read the header row that has the field keys
{if(read_header==1) { 
#read values of variables to replace
    for (i=1; i<=num_fields; i=i+1) value=$i;

    {if($4 == "d") {
      output_file = "no routes.txt";
        while (getline < "nodata.cfg" > 0) {
        for (i=1; i<=num_fields; i=i+1) gsub(header,value);
        print $0 >> output_file;
        }
        close("nodata.cfg")
    }
    }

    {if($4 != "d" && $7 != "none") {
        output_file = "stockton-1-migrate-"$7".cfg";
        output_file2 = "stockton-2-migrate-"$7".cfg";
        while (getline < "mls-migration-template-s1.cfg" > 0) {
            for (i=1; i<=num_fields; i=i+1) gsub(header,value);
            print $0 >> output_file;
        }
        close("mls-migration-template-s1.cfg")
        while (getline < "mls-migration-template-s2.cfg" > 0) {
            for (i=1; i<=num_fields; i=i+1) gsub(header,value);
            print $0 >> output_file2;
        }
        close("mls-migration-template-s2.cfg")
    }
    }
    
}
}
    
{if(read_header==0) {
#read the names of variables to replace
    num_fields=NF;
    for (i=1; i<=NF; i=i+1) {header=$i}
    read_header=1}}

agama · October 8, 2012, 9:09pm

I would have to say that somewhere in your data you have at least one record where field 4 is not a 'd' and field 7 is empty. What is produced when you run this across your data?

awk -F , '$4 != "d" && !$7 '   data-file

If it generates even one line, that is the cause of your problem.

dodgerfan78 · October 8, 2012, 9:55pm

agama,

Thanks. I ran that command as you suggested and it gave back no input. I am using a spreadsheet editor to make the CSV file so I am positive that there is data in column 7 for every row. It seems to me that my loop somewhere is clearing the value of column 7 then re-reading the dataset against the template.

agama · October 8, 2012, 10:02pm

Thanks for running the test -- too painful, if not impossible, to ask to see all of the data

I didn't see this before...

 
   if($4 == "d") {
      output_file = "no routes.txt";
        while (getline < "nodata.cfg" > 0) {
        for (i=1; i<=num_fields; i=i+1) gsub(header,value);
        print $0 >> output_file;
        }
        close("nodata.cfg")
     next;
 }

You've already processed the record, and using getline you'll read to the end of the file and when you drop into the next block you'll always match because $7 will be nil.

Adding a 'next' will cause awk to read the next record and start over.

dodgerfan78 · October 8, 2012, 10:10pm

That works! thank you agama! I am just learning awk, so I will keep that "next" command in mind next time.

P.S. and thanks for the clarification, that is what i was suspecting but didn't understand how it worked

agama · October 8, 2012, 10:16pm

You can also simplify the if statement in the next section:

if($4 != "d" && $7 != "none")

can become

if( $7 != "none" )

because you already processed a 'd' record above and skipped the rest of the processing. It's minor, but makes it more efficient and that could help if your input is huge.