AWK specific output filename

LMSteed · August 10, 2012, 1:05pm

Hi All,

I'd like to create a specific output filename for AWK.

The file I am processing with AWK looks like:

output_081012.csv*
27*TEXT*1.0*2.0*3.0

where * is my delimeter and the first line of the file is the output filename i'd like to create

is there a way to assign an awk variable to the first line and then use that variable in the printf command to create the output file?

for instance

awk -f inputfile
BEGIN
{
FS='*'
if (FNR==1)
outputfile=$1
}
{
if {FNR==2}
printf("%s,%s,%s,%s,%s\n",$1,$2,$3,$4,$5) >>outputfile
}

Thanks!

Corona688 · August 10, 2012, 1:25pm

Well, you don't put that in the BEGIN -- that runs before any files are processed, not during. You can use OFS to simplify your printf into a print, too.

awk -F"*" -v OFS="," 'NR==1{F=$1; next} { print $1,$2,$3,$4,$5>F }' input

Don_Cragun · August 10, 2012, 2:16pm

I think Corona688's keyboard is dropping characters today... I think he meant:

awk -v FS="*" -v OFS="," 'NR==1{F=$1; next} { print $1,$2,$3,$4,$5>F }' input

I noticed that you attempt at the code used FNR==1 instead of NR==1. If you intended to process multiple input files in a single call to awk and to have awk append to a different output file based on the first line of each input file, I think you want something like:

awk -v FS="*" -v OFS="," 'FNR==1 {
	if (F != "") close(F);
	F=$1
	next
}
	{ print $1,$2,$3,$4,$5>>F}' input_file1 input_file2 input file3 ...

Note also that you don't need the "*" at the end of the first line in your input file. (It doesn't hurt to have it, it just isn't needed for the script to work.)

Corona688 · August 10, 2012, 2:19pm

-F"" is perfectly valid. It's short-form for -v OFS="", not to mention probably older.

Not that I mind you catching my other typos.

Another funny awk thing you might see sometimes is awk '{print $1}' VARNAME="asdf" filename which looks weird but is also a perfectly good way of setting a variable inside awk, and probably older than -v. Just remember that they're parsed the same time as filenames -- i.e. they won't be parsed before a BEGIN {} block. -v VAR=whatever, on the other hand, gets parsed before BEGIN {}.

Don_Cragun · August 10, 2012, 3:27pm

I apologize, -F ERE is a synonym for -v FS=ERE (and it is documented in all of the man page including the POSIX/UNIX standards). I assume you had a typo above and meant FS rather than OFS. When I copied your solution into a file and tried it out, it failed; I must have screwed up something in the cut and paste.

Yes, I know that variables can also be set after the awk program on the command line. In fact you can intermix variable assignment operands and pathname operands. Variable assignments that appear here are processed after any commands specified the the awk program's BEGIN block and before any following file operands are read by the program. So you could have a command line like:

awk ' {$(NF+1)=F;print}' F=file1 file1 F=file2 file2

to cat files with the filename appended to each line in the file. This is documented in the POSIX standard but isn't mentioned on many vendor man pages.

LMSteed · August 10, 2012, 5:58pm

I left out some details but I basically have a bunch of *.csv's that I am trying to collect together into one file. The format of each *.csv matches what I posted earlier, where the filename is the first record and the second line is the data. Is there a good way to add a header row at the top of the output file? For some reason I don't believe my shell is working the way it is supposed to, so I am resorting to calling awk once to create the output file with the header row and then on the second call to populate it. Either way, thanks for your help!

---------- Post updated at 04:58 PM ---------- Previous update was at 04:47 PM ----------

My awk script looks like:
BEGIN{
RS="\n"
FS="*"
OFS=","
ST1="Channel Number"
ST2="Channel Label"
ST3="Time at Max"
ST4="Time History Max"
ST5="Time at Min"
ST6="Time History Min"
ST7="Frequency at Max Response"
ST8="Max Response"
}
{
if (FNR==1)
outputfile=$1
print ST1 ST2 ST3 ST4 ST5 ST6 ST7 ST8 >outputfile
if (FNR==2)
print $1 $2 $3 $4 $5 $6 $7 $8 >>outputfile
}

I thought this would work but it doesn't

Don_Cragun · August 10, 2012, 7:00pm

lmsteed:

I left out some details but I basically have a bunch of *.csv's that I am trying to collect together into one file. The format of each *.csv matches what I posted earlier, where the filename is the first record and the second line is the data. Is there a good way to add a header row at the top of the output file? For some reason I don't believe my shell is working the way it is supposed to, so I am resorting to calling awk once to create the output file with the header row and then on the second call to populate it. Either way, thanks for your help!

---------- Post updated at 04:58 PM ---------- Previous update was at 04:47 PM ----------

My awk script looks like:
BEGIN{
RS="\n"
FS="*"
OFS=","
ST1="Channel Number"
ST2="Channel Label"
ST3="Time at Max"
ST4="Time History Max"
ST5="Time at Min"
ST6="Time History Min"
ST7="Frequency at Max Response"
ST8="Max Response"
}
{
if (FNR==1)
outputfile=$1
print ST1 ST2 ST3 ST4 ST5 ST6 ST7 ST8 >outputfile
if (FNR==2)
print $1 $2 $3 $4 $5 $6 $7 $8 >>outputfile
}

I thought this would work but it doesn't

You're close. You have a few problems:

First, the expressions passed to print need to be separated by a comma.

Second, you print the headerline to outputfile twice (because you're missing a { } pair around the commands you want to run when FNR is 1.

Third, you aren't closing any of the output files you're opening. With a small number of files, it won't matter since all open files will be closed when you get to the end. But if you have a large number of files, you may run out of file descriptors.

The default value for RS is a <newline>, so you don't need to set it.

I've made a couple of other slight changes and reformatted to make it easier to read, but this is VERY similar to what you did:

BEGIN{
    FS="*"
    OFS=","
    ST1="Channel Number"
    ST2="Channel Label"
    ST3="Time at Max"
    ST4="Time History Max"
    ST5="Time at Min"
    ST6="Time History Min"
    ST7="Frequency at Max Response"
    ST8="Max Response"
}

FNR==1 {
    if (output file!="") close(outputfile)
    outputfile=$1
    print ST1,ST2,ST3,ST4,ST5,ST6,ST7,ST8 >outputfile
}
FNR==2 {
    print $1,$2,$3,$4,$5,$6,$7,$8 >>outputfile
}

LMSteed · August 10, 2012, 7:15pm

Awesome! I just had to make one change to get it to work. I changed FNR==1 to NR==1. So I didn't get the header every other line. fAWKing sweet!

Don_Cragun · August 10, 2012, 8:18pm

OK. I completely misunderstood what you were trying to do. :wall: I thought line 1 of EACH input file specified the filename where converted output for that input file was supposed to be written and I assumed you wanted a header in each output file. If you change the FNR==1 to NR==1, then the output file will be determined by the first line of the first file you process and line one in every other input file will be completely ignored (and in this case you don't need to worry about closing output files). Even though your sample input only showed two lines, I'm also surprised that you only have one line of data in each file. That is why my original code had:

FNR==1 { do stuff for line one in each file;next}
    { do stuff for every line except line one in each file }

Also in your orginal problem statement, your input had five fields; now it has eight. If you have a lot of fields to process at some point you want want to change:

    {print $1,$2,...,$NF >> outputfile}

to:

    {$1=$1;print >> outputfile}

I'm glad you got something that is working for you.

LMSteed · August 10, 2012, 8:46pm

More details.. line one of my files contains the same thing.. which represents the filename I'd like to use as my output file. The second line contains statistics on a single channel of data. All of this is because I can't get the software vendor to export the statistics nicely. So the process is to open a file, read the first line, set it as the output variable name, and then process all the files for data on their second line sending all of this data to the same output file. This was my easiest option.. The only other option I have is to process a file that contains all of the data streams in column form. Column 1 would be abscissa data, Column 2 oridinate for channel 2, then ordinate for channel 3, so on and so on.. but when talking time domain data, these files are too large to 1) output and 2) process. When talking frequency domain data, I think it would be easy but I would most likely switch over to Matlab. I wrote an AWK script awhile ago for interactively switching RBE2s to RBE3s in NEdit and found that it was a great utility. Thanks for all your help.. one of these days I will understand the {}. I would like to find a good way to generate an awk error file, but based on all my recent google searches I think I can find a way to do it. Thanks again!

Don_Cragun · August 10, 2012, 9:00pm

lmsteed:

More details.. line one of my files contains the same thing.. which represents the filename I'd like to use as my output file. The second line contains statistics on a single channel of data. All of this is because I can't get the software vendor to export the statistics nicely. So the process is to open a file, read the first line, set it as the output variable name, and then process all the files for data on their second line sending all of this data to the same output file. This was my easiest option.. The only other option I have is to process a file that contains all of the data streams in column form. Column 1 would be abscissa data, Column 2 oridinate for channel 2, then ordinate for channel 3, so on and so on.. but when talking time domain data, these files are too large to 1) output and 2) process. When talking frequency domain data, I think it would be easy but I would most likely switch over to Matlab. I wrote an AWK script awhile ago for interactively switching RBE2s to RBE3s in NEdit and found that it was a great utility. Thanks for all your help.. one of these days I will understand the {}. I would like to find a good way to generate an awk error file, but based on all my recent google searches I think I can find a way to do it. Thanks again!

You're welcome.

Note that if you had given us all of this information in your original post you would have had a working solution hours earlier and saved some of us a lot of extra work.