Format row data into columns

Gangadhar_Reddy · February 26, 2011, 9:52am

I have a file which looks like this:

/* ----------------- EDW$MOC139_R_NNA_BR_SUM_FACT2 ----------------- */

insert_job: EDW$MOC139_R_NNA_BR_SUM_FACT2   job_type: c
command: /home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact2.sh m
machine: edwprod02.dsm.pwj.com
#owner: btchproc
permission: gx,ge,wx,we,mx,me
condition: s(EDW$MOF139_R_NNA_BR_SUM_FACT2)
description: "Loads NNA Branch Summary tables 2"
std_out_file: >/tmp/EDW_MOC139.out
std_err_file: >/tmp/EDW_MOC139.err
alarm_if_fail: 1

/* ----------------- EDW$MOC140_R_NNA_BR_SUM_FACT3 ----------------- */

insert_job: EDW$MOC140_R_NNA_BR_SUM_FACT3   job_type: c
command: /home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact3.sh m
machine: edwprod02.dsm.pwj.com
#owner: btchproc
permission: gx,ge,wx,we,mx,me
condition: s(EDW$MOF140_R_NNA_BR_SUM_FACT3)
description: "Loads NNA Branch Summary tables 3"
std_out_file: >/tmp/EDW_MOC140.out
std_err_file: >/tmp/EDW_MOC140.err
alarm_if_fail: 1

I want the output to be like this:

insert_job|job_type|command|machine|owner|condition|description|alarm_if_fail
EDW$MOC139_R_NNA_BR_SUM_FACT2|c|/home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact2.sh|edwprod02.dsm.pwj.com|btchproc|s(EDW$MOF139_R_NNA_BR_SUM_FACT2)|Loads NNA Branch Summary tables 2|1
EDW$MOC140_R_NNA_BR_SUM_FACT3|c|/home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact3.sh|edwprod02.dsm.pwj.com|btchproc|s(EDW$MOF140_R_NNA_BR_SUM_FACT3)|Loads NNA Branch Summary tables 3|1

Thanks for your solutions in advance.

ctsgnb · February 26, 2011, 5:57pm

sed 's/: /:/g;s/  */ /g;/^$/d;/^\//d;/std.*file/d;/permiss/d;/^command/s/ .$//' tst |xargs -n1 | awk -F: 'BEGIN{print "insert_job|job_type|command|machine|owner|condition|description|alarm_if_fail"};!/^std/&&!/perm/{printf "%s%s",$2,(/alarm/)?RS:OFS}' OFS=\|

---------- Post updated at 11:57 PM ---------- Previous update was at 11:53 PM ----------

I am sure there are some better way to do it but until they are posted, here is a way

[ctsgnb@shell ~/sand]$ cat tst
/* ----------------- EDW$MOC139_R_NNA_BR_SUM_FACT2 ----------------- */

insert_job: EDW$MOC139_R_NNA_BR_SUM_FACT2   job_type: c
command: /home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact2.sh m
machine: edwprod02.dsm.pwj.com
#owner: btchproc
permission: gx,ge,wx,we,mx,me
condition: s(EDW$MOF139_R_NNA_BR_SUM_FACT2)
description: "Loads NNA Branch Summary tables 2"
std_out_file: >/tmp/EDW_MOC139.out
std_err_file: >/tmp/EDW_MOC139.err
alarm_if_fail: 1

/* ----------------- EDW$MOC140_R_NNA_BR_SUM_FACT3 ----------------- */

insert_job: EDW$MOC140_R_NNA_BR_SUM_FACT3   job_type: c
command: /home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact3.sh m
machine: edwprod02.dsm.pwj.com
#owner: btchproc
permission: gx,ge,wx,we,mx,me
condition: s(EDW$MOF140_R_NNA_BR_SUM_FACT3)
description: "Loads NNA Branch Summary tables 3"
std_out_file: >/tmp/EDW_MOC140.out
std_err_file: >/tmp/EDW_MOC140.err
alarm_if_fail: 1

[ctsgnb@shell ~/sand]$ sed 's/: /:/g;s/  */ /g;/^$/d;/^\//d;/std.*file/d;/permiss/d;/^command/s/ .$//' tst |xargs -n1 | awk -F: 'BEGIN{print "insert_job|job_type|command|machine|owner|condition|description|alarm_if_fail"};!/^std/&&!/perm/{printf "%s%s",$2,(/alarm/)?RS:OFS}' OFS=\|
insert_job|job_type|command|machine|owner|condition|description|alarm_if_fail
EDW$MOC139_R_NNA_BR_SUM_FACT2|c|/home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact2.sh|edwprod02.dsm.pwj.com|btchproc|s(EDW$MOF139_R_NNA_BR_SUM_FACT2)|Loads NNA Branch Summary tables 2|1
EDW$MOC140_R_NNA_BR_SUM_FACT3|c|/home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact3.sh|edwprod02.dsm.pwj.com|btchproc|s(EDW$MOF140_R_NNA_BR_SUM_FACT3)|Loads NNA Branch Summary tables 3|1
[ctsgnb@shell ~/sand]$

cgkmal · February 27, 2011, 3:48am

Gangadhar Reddy,
Please try with:

:~$ awk -F": " 'BEGIN{print "insert_job|job_type|command|machine|owner|condition|description|alarm_if_fail"}
 {sub(/[ ]*job_type/,": job_type",$0)}
 /insert_job/{a=$2}
 /job_type/{b=$4}
 /command/{c=substr($2,1,index($2," ")-1)}
 /machine/{d=$2}
 /owner/{e=$2}
 /condition/{f=$2}
 /description/{g=$2}
 /alarm_if_fail/{h=$2;print a"|"b"|"c"|"d"|"e"|"f"|"g"|"h}
 ' inputfile
insert_job|job_type|command|machine|owner|condition|description|alarm_if_fail
EDW$MOC139_R_NNA_BR_SUM_FACT2|c|/home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact2.sh|edwprod02.dsm.pwj.com|btchproc|s(EDW$MOF139_R_NNA_BR_SUM_FACT2)|"Loads NNA Branch Summary tables 2"|1
EDW$MOC140_R_NNA_BR_SUM_FACT3|c|/home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact3.sh|edwprod02.dsm.pwj.com|btchproc|s(EDW$MOF140_R_NNA_BR_SUM_FACT3)|"Loads NNA Branch Summary tables 3"|1

Gangadhar_Reddy · February 28, 2011, 12:33pm

ctsgnb & cgkmal,
Thanks both of you for your advice & time.

I tried executing both of them but it' giving me syntax errors:
I name the inout file as "b" and here is the error:
$sed 's/ :/:/g;s/ */ /g;/^$/d;/^\//d;/std.*file/d;/permiss/d;/^command/s/ .$//' b |xargs -n1 | awk -F: 'BEGIN{print "insert_job|job_type|command|machine|owner|condition|description|alarm_if_fail"};!/^std/&&!/p$&!/perm/{printf "%s%s",$2,(/alarm/)?RS:OFS}' OFS=\| <
awk: syntax error near line 1
awk: bailing out near line 1

$awk -F": " 'BEGIN{print "insert_job|job_type|command|machine|owner|condition|description|alarm_if_fail"}
> {sub(/[ ]*job_type/,": job_type",$0)}
> /insert_job/{a=$2}
> /job_type/{b=$4}
> /command/{c=substr($2,1,index($2," ")-1)}
> /machine/{d=$2}
> /owner/{e=$2}
> /condition/{f=$2}
> /description/{g=$2}
> /alarm_if_fail/{h=$2;print a"|"b"|"c"|"d"|"e"|"f"|"g"|"h}
> ' b
awk: syntax error near line 2
awk: illegal statement near line 2

My server configuration:
$uname -a
SunOS edwprod01.dsm.pwj.com 5.8 Generic_117350-53 sun4u sparc SUNW,Netra-T12

Thanks,
Gangadhar

vgersh99 · February 28, 2011, 12:34pm

On Solaris, use nawk instead of awk

Gangadhar_Reddy · February 28, 2011, 12:56pm

Thanks vgersh99. It worked now

ctsgnb, ur script worked for few column till condition and then it had some discrepancy for rest of the columns. Thanks anyways for your help. I appreciate it.

cgkmal, ur script worked brilliantly. thanks so much for ur valuable advice. Can you please explain it? I have one more following type of file which again I need to format it. I would like to try it myself first.

Also, I had read lots of UNIX books where they have just mentioned simple descriptions for AWK & SED. Do you know of any books which I can read to get good exposure to AWK & SED programming? That will help me a lot

$jr 'EDW$MOF111_R_NNA_STRT' -q
/* ----------------- EDW$MOF111_R_NNA_STRT ----------------- */
insert_job: EDW$MOF111_R_NNA_STRT job_type: f
machine: edwprod02.dsm.pwj.com
#owner: btchproc
permission: gx,ge,wx,we,mx,me
date_conditions: 1
run_calendar: EDW_NNA_Cal
start_times: "18:00"
description: "EDW - Enterprise Data Warehouse"
watch_file: /home/btchproc/load_process/ftp_home/NNA_STRT.DAT
watch_file_min_size: 0
watch_interval: 60
alarm_if_fail: 1

Love u all guys.

panyam · February 28, 2011, 1:47pm

Reddy garu,

Sed - An Introduction and Tutorial

Awk - A Tutorial and Introduction - by Bruce Barnett

I believe these tutorials are good and self explanatory.

Regards
Ravi

Gangadhar_Reddy · February 28, 2011, 2:08pm

Thanks Ravi ji I'll go through it

I was able to work out on the file I mentioned. Btwn I have one question:
There is a line that i mentioned as
command: /home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact3.sh m
For the above line this workds fine:
/command/{c=substr($2,1,index($2," ")-1)}

But its an issue if at the end of line there is no "m". Give me a solution which will work in both cases.
command: /home/btchproc/load_process/batch_files/batch_nna_brn_split_sum_fact3.sh

Please help

panyam · February 28, 2011, 2:31pm

Just change the condition as below:

 
/command/{if(substr($2,index($2,".")+4)=="m" { c=substr($2,1,index($2," ")-1)} else {c=$2}}

or

/command/{if(substr($2,index($2," ")+1)=="m" { c=substr($2,1,index($2," ")-1)} else {c=$2}}

I did not tested , so please debug the code/test it by getting ".sh" position and then +2 to get "m". if it exists derive "c" differently , else $2 will be "c".

ctsgnb · February 28, 2011, 2:35pm

@ganga reddy :

I based my code on the sample you gave : if your input file is different than the sample you posted, then of course, unexpected output may come.

By the way, i noticed in your copy past of my command line , you have :

...n|alarm_if_fail"};!/^std/&&!/p$&!/perm/{printf  "%s%s",$2,(/alarm/)?RS:OFS}' OFS=\| <

So in red it looks like copy/paste or typo error you did when copying my suggestion ...

...n|alarm_if_fail"};!/^std/&&!/perm/{printf "%s%s",$2,(/alarm/)?RS:OFS}' OFS=\|

And ... as suggested by vgersh99, before running my code,
make sure you have changed awk by nawk if running it on solaris.

cgkmal · February 28, 2011, 10:15pm

Hi Gangadhar Reddy,

Sorry for my lateness to reply, I had not connected until now.

In order to handle when "m" appears or not, I have modify a little bit the line you mention from

/command/{c=substr($2,1,index($2," ")-1)}

to

/command/{if(index($2," ")==0) {c=$2} else {c=substr($2,1,index($2," ")-1)}}

The modified code is as follow:

awk -F": " 'BEGIN{print "insert_job|job_type|command|machine|owner|condition|description|alarm_if_fail"}
{sub(/[ \t]+$/, "")} # New line added to remove any whitespaces/tabs at the end of each line.
{sub(/[ ]*job_type/,": job_type",$0)}
/insert_job/{a=$2}
/job_type/{b=$4}
/command/{if(index($2," ")==0) {c=$2} else {c=substr($2,1,index($2," ")-1)}}
/machine/{d=$2}
/owner/{e=$2}
/condition/{f=$2}
/description/{g=$2}
/alarm_if_fail/{h=$2;print a,b,c,d,e,f,g,h} # Modified a little bit to make it shorter than before
' OFS="|" inputfile

Below I try to explain the code, I hope it helps and I hope be clear enough :D.

awk -F": " 'BEGIN{print "insert_job|job_type|command|machine|owner|condition|description|alarm_if_fail"} # We begin setting headers
{sub(/[ \t]+$/, "")} 
# To remove any whitespaces/tabs at the end of each line to avoid more if statements below

{sub(/[ ]*job_type/,": job_type",$0)} 
# Replace the string " job_type" with ": job_type" to work globally with FS=": "

#For the lines of code like: 
/STRING/{X=$i}
# All lines of this form means, "Search string "STRING", when it is found, assign value of field "i" of that line to variable "X".
 
/command/{if(index($2," ")==0) {c=$2} else {c=substr($2,1,index($2," ")-1)}} 
# Search string "command", when found analyse as follow
# The part "{if(index($2," ")==0) {c=$2}"  means, if in field 2 there are no any spaces "(index($2," ")==0)", assign value of field 2 to c,
#The part "else {c=substr($2,1,index($2," ")-1)}}" means, if there are spaces in $2, extract a new substring from $2, from the beginning up to the space position, (given by index..)
# the "-1" part is to remove the space at the end of the substring extracted from $2, resulting when is removed the "m" or any other word present.

/alarm_if_fail/{h=$2;print a,b,c,d,e,f,g,h} # The part "print a,b,c,d,e,f,g,h" prints all variables in the order showed
' OFS="|" inputfile # the part  OFS="|" at the end, sets the Output Field Separator = "|".

Regards ,

Gangadhar_Reddy · March 1, 2011, 12:14pm

Thanks Panyam & Ctsgnb.

Thanks Cgkmal for the detailed explanation

---------- Post updated at 10:21 AM ---------- Previous update was at 09:40 AM ----------

Cgkmal,
One more question. As I said i wrote script for one input which had following lines
watch_file: /home/btchproc/load_process/ftp_home/NNA_STRT.DAT
watch_file_min_size: 0

And I just wanted watch_file column in my output. But when I specified
/watch_file/{f=$2}
it showed the value of watch_file_min_size.

For now what I did is, I globally replaced the watch_file_min_size with min_size and got the desired output. But what wud be the correct way to do that.

Thanks

cgkmal · March 1, 2011, 3:41pm

Hi Gangadhar Reddy,

No problem, I learning helping others.

In this case is needed to match the exact word, "watch_file",

you can do it by replacing this from

/watch_file/{f=$2}

to

/\<watch_file\>/{f=$2}

* The characters  "\<" and "\>"  do  "anchor" the expression between to only match if it is on a word boundary.
Ref: Regular Expressions

Regards

Gangadhar_Reddy · March 7, 2011, 4:49pm

Here is my input file:
______________________________________________________________________________
Start Dependent
Job Name Status Date Cond? Cond? Jobs?
-------- ------ --------------- ----- ---------
DWC$DAC103_acct_naked_opt INACTIVE No Yes Yes

Condition: s(DWC$DAC111_acct) and s(DWC$DAC141_cust) and
s(DWC$DAC131_acct_cust_rel) and s(DWC$DAC134_cust_classfn)

Atomic Condition Current Status T/F
---------------- -------------- ---
SUCCESS(DWC$DAF103_ACCNKTOPTSTRT.DAT) RUNNING F

Dependent Job Name Condition
------------------ ---------
RDS$DAC524_D_DWC_AC_NKDO_FACT SUCCESS(DWC$DAC103_acct_naked_opt)
WST$DAC437_R037_Uncovered_Opt SUCCESS(DWC$DAC103_acct_naked_opt)

I want the output as:
Job Name|Dependent Job Name
DWC$DAC103_acct_naked_opt|RDS$DAC524_D_DWC_AC_NKDO_FACT, WST$DAC437_R037_Uncovered_Opt

Note: The Depenedent job may be 1,2,3, etc or it may not be present too.

---------- Post updated at 04:49 PM ---------- Previous update was at 04:46 PM ----------

cgkmal:

Hi Gangadhar Reddy,

No problem, I learning helping others.

In this case is needed to match the exact word, "watch_file",

you can do it by replacing this from
/watch_file/{f=$2}
to
/\<watch_file\>/{f=$2}

* The characters  "\<" and "\>"  do  "anchor" the expression between to only match if it is on a word boundary.
Ref: Regular Expressions
Regards

I have one more question - in case if the watch_file line is not present for some jobs. What happens now is the next time it get other jobs, the watch_file value is same as previous job. How can I make it void at the end of one loop so that next time if it doesn't get watch_file line there should be blank.

cgkmal · March 8, 2011, 3:02am

Hi GReddy,

For your new inputfile I get this script, not short one, but works with your sample.

### 1-Using SED to remove lines and other data to simplify #####
#################################################################
sed -e '
s/\(.*_[a-zA-Z]*\)\( .*\)/\1/;
s/^Atomic.*//;
s/^Condition.*//;
s/^SUCCESS(.*//;
s/^-.*//;
s/^_.*//;
s/\(^Job Name\)\( .*\)/\1/;
s/\(^Dependent Job Name\)\( .*\)/\1/;
/./!d' inputfile | 
####2-Using AWK to merge lines after pattern "Dependent Job Name"###
#################################################################

awk 'BEGIN{print "Job Name|Dependent Job Name"}
{
if ( $0~/Dependent Job Name/ || $0~/.*_.*/ )
printf("%s,", $0)
else
printf(" %s\n", $0)
}' > temp 
####3-Using SED to remove extra words to match the requirements #######
#################################################################
sed -e '
s/,Dependent Job Name,/|/;
s/Start Dependent//;
2,$s/Job Name//;
s/^[ \t]*//;
s/[ \t]*$//;
/./!d
s/,$//' temp > outputfile
rm temp

Job Name|Dependent Job Name
DWC$DAC103_acct_naked_opt|RDS$DAC524_D_DWC_AC_NKDO_FACT,WST$DAC437_R037_Uncovered_Opt
DWC$DAC104_acct_position|RDS$DAC504_D_DWC_AC_POS,RDS$DAC504_D_DWC_BRN_POS_FACT,RDS$DAC504_D_DWC_FA_POS_FACT,WST$DAC414_Cap_Limit_Master,WST$DAC427_R017_TAPS,WST$DAC441_R043_Error_Act_Pos,WST$DAC444_R052_B_Share_Pos,WST$DAC446_R054_C_Share_Pos,WST$DAC452_R064_FDIC_Exposure,

And for the other question

I�m not understand clear enough, may you put an input and output example?

Regards

Gangadhar_Reddy · March 8, 2011, 10:33am

Here is wat i was asking for

Input file:

/* ----------------- DWC$DAF101_VALUSTRT.DAT ----------------- */

insert_job: DWC$DAF101_VALUSTRT.DAT   job_type: f
term_run_time: 1320
watch_interval: 60

/* ----------------- DWC$DAF102_E99MSGSTRT.DAT ----------------- */

insert_job: DWC$DAF102_E99MSGSTRT.DAT   job_type: f
watch_interval: 60

Output File:

insert_job|job_type|term_run_time|watch_interval
DWC$DAF101_VALUSTRT.DAT|f|1320|60
DWC$DAF102_E99MSGSTRT.DAT|f|1320|60

If u check the above output clearly, in 2nd record the value should be blank (instead of 1320) for term_run_time. But thats not happening. If its not able to find term_run_time in the next job, it assign the previous job values.

Gangadhar_Reddy · March 8, 2011, 11:34am

Cgkmal,
The script didn't work. But i don't think so the problem was with ur script. The problem was the way i posted my input file. There is formatting issue. Please find attached the screenshot of my inputfile.
Note: Also it might happen that there might be no dependent jobs.

Thanks,
Gangadhar

cgkmal · March 8, 2011, 5:02pm

Hi again GReddy,

To fix this is needed to relesase/clear the variables value after printing them as below.

awk -F": " 'BEGIN{print "insert_job|job_type|term_run_time|watch_interval"}
{sub(/[ \t]+$/, "")}
{sub(/[ ]*job_type/,": job_type",$0)}
/insert_job/{a=$2}
/job_type/{b=$4}
/term_run_time/{c=$2}
/watch_interval/{d=$2;print a,b,c,d; a="";b="";c="";d=""}
' OFS="|" inputfile

The other issue:

Did you adapt the code to the correct input format? may you show the real input file?

Best regards

Gangadhar_Reddy · March 9, 2011, 10:54am

Cgkmal,
Check the attachment in my above post. It has the correct input file format.

Thanks,
Gangadhar

cgkmal · March 9, 2011, 5:16pm

GReddy,

I saw it before, but that�s an image of the file format. It�s needed the real sample file.