I'm trying to modify the script given in post 7 of the following thread: 146564-need-parse-jil-file-into-excel-file.html. (Sorry, can't post the URL as I don't have enough posts.)
The original script is as follows:
awk -F ' *[[:alnum:]_]*: *' 'BEGIN {h="insert_job;box_name;command;owner;permission;condition;description;std_out_file;std_err_file;alarm_if_fail"; print h; n=split(h,F,/;/)}
function pr() {if(F[1] in A) {for(i=1;i<=n;i++)printf "%s%s",A[F],(i<n)?";":RS}}
/insert_job/ {pr(); delete A}
{for(i in F){if($0~"^"F)A[F]=$2}}
END {pr()}' infile
The modifications I'm trying to make involve returning any potential token. Everything so far seems to be working except when I try to retrieve the job_type. I'm pretty sure this is because it's on the same line as the insert_job, but it's been so long since I've worked with Awk, I'm not clear on how to fix it.
The script that you have assumes that you are trying to create a semicolon separated values line of output for each group of input lines that have insert_job: in the first line of the group and that the output you want contains values found in fields named: insert_job , box_name , command , owner , permission , condition , description , std_out_file , std_err_file , and alarm_if_fail . It also assumes that there is no more than one field per input line.
Besides not looking for more than one field per line, it does not look for job_type , machine , date_conditions , days_of_week , or start_times .
Do you still want a semicolon separated values file?
Do you know what fields you want to appear in your output file?
Do you need a program that will search for all of the field names that appear in your input file(s) and then produce a semicolon separated output file with a heading line showing every field found and then print a line for each input record found? If that is what you want, how is the program supposed to determine record boundaries?
What type of system are you using?
What is the value of {LINE_MAX} on your system? (I.e., what is the output from the command: getconf LINE_MAX ?)
You didn't answer most of my questions. But, if you have something that is almost working, please post it! Maybe we can help you refine it to do what you want.
A trivial way to make it work is to change your script as follows:
sed 's/ \(job_type:\)/\
\1/' infile | awk -F ' *[[:alnum:]_]*: *' 'BEGIN {h="insert_job;box_name;command;owner;permission;condition;description;std_out_file;std_err_file;alarm_if_fail;job_type"; print h; n=split(h,F,/;/)}
function pr() {if(F[1] in A) {for(i=1;i<=n;i++)printf "%s%s",A[F],(i<n)?";":RS}}
/insert_job/ {pr(); delete A}
{for(i in F){if($0~"^"F)A[F]=$2}}
END {pr()}' infile
Adding the code marked in blue in two places to your script and deleting the filename in red from your script. (Note that a newline character must immediately follow the backslash character ( \ ) at the end of the first line; adding any spaces or tabs at this point will keep this script from working.)
That's funny. When I try it, it processes each job in the list of files I give it when I supply them as one job in each of several files and when I supply several jobs in one file. (You did delete the file argument that I marked in red from the end of the awk script, didn't you?)
To me it looks like the script is dealing with any number of jobs in a file: it collects all sort of info into the A array, and when detecting the "insert job" string, prints a line, deletes A and starts over.
If that's not the way it works for you, post (or better: attach) the input file.