awk - Parsing Autosys JIL

GnuScripter · February 12, 2013, 12:14am

I'm trying to modify the script given in post 7 of the following thread: 146564-need-parse-jil-file-into-excel-file.html. (Sorry, can't post the URL as I don't have enough posts.)

The original script is as follows:

awk -F ' *[[:alnum:]_]*: *' 'BEGIN         {h="insert_job;box_name;command;owner;permission;condition;description;std_out_file;std_err_file;alarm_if_fail"; print h; n=split(h,F,/;/)}
                             function pr() {if(F[1] in A) {for(i=1;i<=n;i++)printf "%s%s",A[F],(i<n)?";":RS}}
                             /insert_job/  {pr(); delete A}
                                           {for(i in F){if($0~"^"F)A[F]=$2}}
                             END           {pr()}' infile

The modifications I'm trying to make involve returning any potential token. Everything so far seems to be working except when I try to retrieve the job_type. I'm pretty sure this is because it's on the same line as the insert_job, but it's been so long since I've worked with Awk, I'm not clear on how to fix it.

Example:

/* ----------------- backupJIL ----------------- */

insert_job: backupJIL job_type: c
command: autorep -J ALL -q > /home/autosys/...p/autosys_jil_bk
machine: machine
owner: autosys@machine
permission: gx,ge,wx,we
date_conditions: 1
days_of_week: tu,we,th,fr,sa
start_times: "17:00"
description: "Daily backup of job definitions"
std_out_file: /tmp/autosys_jil_backup.out
std_err_file: /tmp/autosys_jil_backup.err
alarm_if_fail: 1

Could someone please help?

Thanks!
GnuScripter

Don_Cragun · February 12, 2013, 3:17am

The script that you have assumes that you are trying to create a semicolon separated values line of output for each group of input lines that have insert_job: in the first line of the group and that the output you want contains values found in fields named: insert_job , box_name , command , owner , permission , condition , description , std_out_file , std_err_file , and alarm_if_fail . It also assumes that there is no more than one field per input line.

Besides not looking for more than one field per line, it does not look for job_type , machine , date_conditions , days_of_week , or start_times .

Do you still want a semicolon separated values file?

Do you know what fields you want to appear in your output file?

Do you need a program that will search for all of the field names that appear in your input file(s) and then produce a semicolon separated output file with a heading line showing every field found and then print a line for each input record found? If that is what you want, how is the program supposed to determine record boundaries?

What type of system are you using?

What is the value of {LINE_MAX} on your system? (I.e., what is the output from the command: getconf LINE_MAX ?)

How big is (are) your input file(s)?

GnuScripter · February 12, 2013, 10:05am

I've got everything working the way I need except getting it to pick up the job_type which is on the same line as the insert_job.

Line_max is 2048.

Right now I'm just using a single box.

System is Linux.

Thanks!

Don_Cragun · February 12, 2013, 1:11pm

You didn't answer most of my questions. But, if you have something that is almost working, please post it! Maybe we can help you refine it to do what you want.

GnuScripter · February 12, 2013, 2:05pm

If you can make the script I posted work the way it is, but have it pick up the job_type, that would be all I need for now.

Thanks!

Don_Cragun · February 12, 2013, 2:57pm

A trivial way to make it work is to change your script as follows:

sed 's/ \(job_type:\)/\
\1/' infile | awk -F ' *[[:alnum:]_]*: *' 'BEGIN         {h="insert_job;box_name;command;owner;permission;condition;description;std_out_file;std_err_file;alarm_if_fail;job_type"; print h; n=split(h,F,/;/)}
                             function pr() {if(F[1] in A) {for(i=1;i<=n;i++)printf "%s%s",A[F],(i<n)?";":RS}}
                             /insert_job/  {pr(); delete A}
                                           {for(i in F){if($0~"^"F)A[F]=$2}}
                             END           {pr()}' infile

Adding the code marked in blue in two places to your script and deleting the filename in red from your script. (Note that a newline character must immediately follow the backslash character ( \ ) at the end of the first line; adding any spaces or tabs at this point will keep this script from working.)

GnuScripter · February 12, 2013, 4:56pm

Works as expected. Thanks!

One more question. How do I get it to process each job in the file? Currently, it's only doing the first one.

A new record could be identified with each insert_job it finds.

Thank you!

Don_Cragun · February 12, 2013, 5:37pm

That's funny. When I try it, it processes each job in the list of files I give it when I supply them as one job in each of several files and when I supply several jobs in one file. (You did delete the file argument that I marked in red from the end of the awk script, didn't you?)

GnuScripter · February 12, 2013, 9:32pm

Found the problem.

When you dump a JIL of a box using Autorep, the beginning of each line for each cmd job is indented with a space.

Using sed, I removed it and now have this which is working completely as expected:

sed 's/ \(job_type:\)/\ 
\1/' infile | sed -e 's/^[ \t]*//' | awk -F ': ' 'BEGIN         {h="insert_job;box_name;command;owner;permission;condition;description;std_out_file;std_err_file;alarm_if_fail;job_type"; print h; n=split(h,F,/;/)} 
                             function pr() {if(F[1] in A) {for(i=1;i<=n;i++)printf "%s%s",A[F],(i<n)?";":RS}}                             
                             /insert_job/  {pr(); delete A}                                            
                             {for(i in F){if($0~"^"F)A[F]=$2}}                              
                             END           {pr()}'

Also note that I modifed the -F do only do a colon space. With what was there before was causing (for example) times in run windows to be chopped off.

Thanks for your help!

RudiC · February 13, 2013, 2:55am

To me it looks like the script is dealing with any number of jobs in a file: it collects all sort of info into the A array, and when detecting the "insert job" string, prints a line, deletes A and starts over.
If that's not the way it works for you, post (or better: attach) the input file.