Need to extract jil file details in a excelsheet

newbie_shell · February 28, 2017, 1:07pm

I am very new to shell scripting.

I have a autosys jil file that looks like :--

/* ------------- JOB1 ------------------ */

insert_job: JOB1    job_type:  b
owner:     cm@pelonmuck
permission: gx,ge,wx,we,mx,me
date_conditions: 1
days_of_week: mo,tu,we,th,fr,su
start_time: "18:30"
box_success: s(SOME_JOB1) and s(SOME_JOB2) and s(SOME_JOB3)
box_failure: (f(SOME_JOB1) or f(SOME_JOB2)) & f(SOME_JOB3)
description: "pull files"
max_run_alarm: 15
alarm_if_fail: 1
timezone: US/Eastern


/* ------------- JOB2 ------------------ */

insert_job: JOB2    job_type:  c
box_name: SOME_BOX_NAME
command: /usr/bin/run /usr/cache/START_JOB
machine: machine@conti.com
owner:     cm@pelonmuck        
permission: gx,ge,wx,we,mx,me
days_of_week: sa,su
description: "pull all files"
max_run_alarm: 15
alarm_if_fail: 1
timezone: GMT

I need to create a shell script to get the output in a excelsheet/csv format like below:-

insert_job,machine,date_conditions,days_of_week,start_time,timezone,description,command,alarm_if_fail
JOB1,,1,mo,tu,we,th,fr,su,"18:30",US/Eastern,"pull files",,1
JOB2,machine@conti.com,,sa,su,,GMT,"pull all files",/usr/bin/run /usr/cache/START_JOB,1

Could you all pls help me to build this.

RudiC · February 28, 2017, 1:32pm

Any attempts / ideas / thoughts from your side?

newbie_shell · February 28, 2017, 1:40pm

Hi RudiC, Yeah , I guess we need to parse the file first and look out for those compulsory columns name in the output and then pullout the values next to that ....

Corona688 · February 28, 2017, 2:51pm

What have you tried?

newbie_shell · March 2, 2017, 8:22am

I have reached quite close thanks to a already existing thread

awk -F ' *[[:alnum:]_]*: *' 'BEGIN         {h="insert_job;box_name;command;owner;permission;condition;description;std_out_file;std_err_file;alarm_if_fail"; print h; n=split(h,F,/;/)}
                             function pr() {if(F[1] in A) {for(i=1;i<=n;i++)printf "%s%s",A[F],(i<n)?";":RS}}
                             /insert_job/  {pr(); delete A}
                                           {for(i in F){if($0~"^"F)A[F]=$2}}
                             END           {pr()}' infile > outfile.csv

The issue with above code is, it gives me the output as below:-

insert_job,machine,date_conditions,days_of_week,start_time,timezone,description,command,alarm_if_fail
JOB1;;1,mo,tu,we,th,fr,su;";US/Eastern;"pull files";;1
JOB2;machine@conti.com;;sa,su;;GMT;"pull all files";/usr/bin/run /usr/cache/START_JOB;1

Its not able to print the start_time in first row, which should be "18:30"
Any idea, how do we print the start_time which is in int format

RudiC · March 2, 2017, 8:52am

Try

awk -F: '
NR==1           {HD = "insert_job,machine,date_conditions,days_of_week,start_time,timezone,description,command,alarm_if_fail"
                 for (HDCnt=i=split(HD, HDArr, OFS); i>0; i--) SRCH[HDArr] 
                 print HD
                }

function PRT()  {for (i=1; i<=HDCnt; i++)       {printf "%s%s", RES[HDArr], i<HDCnt?OFS:ORS
                                                }
                 split ("", RES)
                }

/--- JOB/       {if (PR) PRT()
                 PR=1
                }

$1 in SRCH      {T = $1
                 sub ($1 FS " *", "")
                 sub (/  +.*$/, "")
                 RES[T] = $0
                }

END             {PRT()
                }
' OFS=","  file
insert_job,machine,date_conditions,days_of_week,start_time,timezone,description,command,alarm_if_fail
JOB1,,1,mo,tu,we,th,fr,su,"18:30",US/Eastern,"pull files",,1
JOB2,machine@conti.com,,sa,su,,GMT,"pull all files",/usr/bin/run /usr/cache/START_JOB,1

newbie_shell · March 2, 2017, 10:38am

Thanks, I have one final hurdle.

The script you gave on top works like magic but only where the jil details does not have leading spaces.
For example for a jil file containing below jil detail:-

  /* ------------- JOB1 ------------------ */

  insert_job: JOB1	job_type:  b
  owner: 	cm@pelonmuck
  permission: gx,ge,wx,we,mx,me
  date_conditions: 1
  days_of_week: mo,tu,we,th,fr,su
  start_time: "18:30"
  box_success: s(SOME_JOB1) and s(SOME_JOB2) and s(SOME_JOB3)
  box_failure: (f(SOME_JOB1) or f(SOME_JOB2)) & f(SOME_JOB3)
  description: "pull files"
  max_run_alarm: 15
  alarm_if_fail: 1
  timezone: US/Eastern

the output will not have anything other than the header because of leading spaces, my infile has leading spaces in multiple places(not more than 2 or 3 spaces) and those details are just skipped by the script.
Can you please help.

RudiC · March 2, 2017, 11:01am

Try

awk -F: '
NR==1           {HD = "insert_job,machine,date_conditions,days_of_week,start_time,timezone,description,command,alarm_if_fail"
                 for (HDCnt=i=split(HD, HDArr, OFS); i>0; i--) SRCH[HDArr] 
                 print HD
                }

function PRT()  {for (i=1; i<=HDCnt; i++)       {printf "%s%s", RES[HDArr], i<HDCnt?OFS:ORS
                                                }
                 split ("", RES)
                }

/--- JOB/       {if (PR) PRT()
                 PR=1
                }

                {sub ("^  *", _)
                }

$1 in SRCH      {T = $1
                 sub ($1 FS " *", "")
                 sub (/[	 ][	 ]+.*$/, "")
                 RES[T] = $0
                }

END             {PRT()
                }
' OFS=","  file

Please be aware that the structure of the line containing insert_job differs from the data in post#1 (<TAB> separated, not spaces), so the splitting off of the job_type doesn't work any more.

andy391791 · March 3, 2017, 3:50am

Hi Rudi, would it be possible for you to explain how this script works? Im still learning awk and theres a lot i dont understand in this script?
Many thanks

RudiC · March 3, 2017, 5:17am

That proposal is only a slight adaption of the script you posted as your attempt. How about you try to explain it here, and we jump in on the gaps that you can't cover?