Issues in reading file using 'awk'

Dear all,
I am using following function of some script to assign variable "JobNo" some value form file $SAMPLE"_status.log" [1] ( generated using the red color command )

  
   crab ntuplize_crab -status -c $SAMPLE >& $SAMPLE"_status.log" &  
   echo $SAMPLE"_status.log" "====="  
   jobNo=$(awk '/Jobs with Wrapper/ && $NF != 0{s=1}   /List of jobs/ && s{if(p){p=p","$NF}else{p=$NF};s=""}END{print p}' $SAMPLE"_status.log" )
    #sleep 200                                                                                                                                        

    echo $jobNo "====="
    echo $jobNo "====="
 

The name of the file is correctly printed on the screen and also I checked the content is fine which is [1].
Now, the execution of this script pass me the following output:

 
qcd120_status.log =====
=====
=====

The blue is the file name and is fine. But when I am trying to print the JobNo it only print the "===="..
And when I use the above command on the terminal it is passing me the proper JobNo which I want, should be following:

 57,331,333,336,348,2-3,11,28,45,49,67-68,80,82,87,102-104,107-108,111-112,114,117-118,123-125,127-132,134,139,148-157,159-161,169-172,174,179-180,182-185,200,202,204,208-210,219,226,236,238,245,251,253-257,262,265,271,280,288,308,330,353,355,375,377,381,385,387 

Please help, I am completely stuck.

[1]

 crab:  ExitCodes Summary
 >>>>>>>>> 309 Jobs with Wrapper Exit Code : 0
         List of jobs: 1,4-10,12-27,29-44,46-48,50-56,58-66,69-79,81,83-86,88-101,105-106,109-110,113,115-116,119-122,126,133,135-138,140-147,158,162\
-168,173,175-178,181,186-199,201,203,205-207,211-218,220-225,227-235,237,239-244,246-250,252,258-261,263-264,266-270,272-279,281-287,289-307,309-329,\
332,334-335,337-347,349-352,354,356-374,376,378-380,382-384,386,388-401
        See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning

crab:  ExitCodes Summary
 >>>>>>>>> 4 Jobs with Wrapper Exit Code : 8028
         List of jobs: 57,331,333,336
        See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning

crab:  ExitCodes Summary
 >>>>>>>>> 1 Jobs with Wrapper Exit Code : 8021
         List of jobs: 348
        See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning

crab:  ExitCodes Summary
 >>>>>>>>> 87 Jobs with Wrapper Exit Code : 60307
         List of jobs: 2-3,11,28,45,49,67-68,80,82,87,102-104,107-108,111-112,114,117-118,123-125,127-132,134,139,148-157,159-161,169-172,174,179-180\
,182-185,200,202,204,208-210,219,226,236,238,245,251,253-257,262,265,271,280,288,308,330,353,355,375,377,381,385,387
        See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning

crab:   401 Total Jobs
 

Hi,

  • Is qcd120_status.log the actual name of the log that your posted under [1] ?
  • Does the log file contain \ at the end of some of the lines or did you put those there yourself?

Hi,
Thanks for the reply,
Yes, it is the actual name of the log that I posted at [1].
ummm, I guess it the the symbol for the starting of the next line. Cos the content was not enough to come on the single line.

Besides, using the command on the terminal give the proper 'jobNo' on the same qcd12_status.log file.

emily

I am guessing that we have a small language barrier in this discussion.

The output that you said you were getting when you run the command manually has a leading space that the awk script you showed us would not produce.
The output that you said you were getting when you run the command manually also contains text from the continuation line in your log file that your awk script does not handle.

When I run the awk script you provided with the input data you provided, the output produced is:

57,331,333,336,348,2-3,11,28,45,49,67-68,80,82,87,102-104,107-108,111-112,114,117-118,123-125,127-132,134,139,148-157,159-161,169-172,174,179-180\

not:

 57,331,333,336,348,2-3,11,28,45,49,67-68,80,82,87,102-104,107-108,111-112,114,117-118,123-125,127-132,134,139,148-157,159-161,169-172,174,179-180,182-185,200,202,204,208-210,219,226,236,238,245,251,253-257,262,265,271,280,288,308,330,353,355,375,377,381,385,387

But, we are using awk after the command that produced the input file is long gone. Since your script is running crab to produce the file being read by awk asynchronously in the background while awk is running in the foreground, there is a good chance that awk will hit end of file before crab writes anything into the file. If this happens, obviously jobNo will be set to an empty string.

If you get rid of the ampersand ( & ) at the end of the crab command line and if crab does not split long "List of jobs" lines with backslashes ( \ ) and follow them with the continuation lines that you showed in your 1st message in this thread, everything should work as you expect it to work.

Hi Dan,
Thanks for looking into it..but I am confuse too..Let me rephase my trouble again...

When I run this command manually here is the response:

 

[emily04@cmslpc38 pythia]$ jobNo=$(awk '/Jobs with Wrapper/ && $NF != 0{s=1}   /List of jobs/ && s{if(p){p=p","$NF}else{p=$NF};s=""}END{print p}' qcd120_status.log )
[emily04@cmslpc38 pythia]$ echo $jobNo
57,331,333,336,348,2-3,11,28,45,49,67-68,80,82,87,102-104,107-108,111-112,114,117-118,123-125,127-132,134,139,148-157,159-161,169-172,174,179-180,182-185,200,202,204,208-210,219,226,236,238,245,251,253-257,262,265,271,280,288,308,330,353,355,375,377,381,385,387
[pooja04@cmslpc38 pythia]$ 

Which I want from the SCRIPT too.

And for me, script is giving me nothing for the JobNo variable. What it rather pass me as output is:

 
---------Will Resubmit the Jobs--------------
qcd120_status.log =====
=====
=====

And again, the function is defined as following in the script:

ResubmitJobs() {
 crab ntuplize_crab -status -c $SAMPLE >& $SAMPLE"_status.log" &
  echo "---------Will Resubmit the Jobs--------------"
                                       
    echo $SAMPLE"_status.log" "====="
    
    jobNo=$(awk '/Jobs with Wrapper/ && $NF != 0{s=1}   /List of jobs/ && s{if(p){p=p","$NF}else{p=$NF};s=""}END{print p}' $SAMPLE"_status.log" )
    #sleep 200                                                                                                                                        

    echo $jobNo "====="
    echo $jobNo "====="

I hope it is easy for you now to understand it.

greetings,
emily

Hi Emily.

I will assume you meant "Don" rather than "Dan".

Yes, I understand the output you want.

Yes, I understand. And, as I said before, if you remove the ampersand marked in magenta above, you will get the output you want. Your problem is that awk is processing $SAMPLE"_status.log" before the crab command writes any data into it. You are running crab and awk concurrently instead of letting crab complete before letting awk read the data that crab will eventually produce.

Hi Don,
Yup, it is working..Thanks Don...:slight_smile: :slight_smile:

May I ask another query, which is following:
I want the script to look for all directories within that particular directory and perform
crab task , get the jobNo...
At present, I have following directory where I want to perform these operation:

drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:42 qcd800
-rw-r--r-- 1 emily04 us_cms   9739 Mar 15 11:42 VgAnalyzerKitDemoMC52X_AOD.pyc
drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:43 qcdEm40
drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:44 GJet20To40
drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:46 qcd1000
drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:47 qcd120
drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:48 GJet40ToInf
drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:49 qcd1400
drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:51 qcd50
drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:52 qcd1800
drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:53 qcd30
drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:54 qcd80
drwxr-xr-x 6 emily04 us_cms   2048 Mar 15 11:55 qcdEm30To40

But also,I am afraid sometime they does not have any parriculer pattern, for example another set of dorectories which
i have are following:

 
drwxr-xr-x 6 emily4 us_cms  2048 Mar 15 10:32 tt
drwxr-xr-x 6 emily04 us_cms  2048 Mar 15 10:33 zgamma
drwxr-xr-x 6 emily04 us_cms  2048 Mar 15 10:34 DiPhoJet
drwxr-xr-x 6 emily04 us_cms  2048 Mar 15 10:35 DYJets50

Can I define some kind of 'array' declaring the directories name and 'loop' to run over them?

Thanks in advance.
emily

---------- Post updated at 11:34 AM ---------- Previous update was at 10:01 AM ----------

Hi again,
I could perform the array based execution of the commands. Thanks all
for your kind help.

What I did is following:

GREP="qcd30"
GREP=""QCD50"
for file in "${GREP[@]}"
do
      crab ntuplize_crab -getoutput -c $FileNameIndx
done

But while doing this, it come to my mind if I can perform parallel execution of the different GREP[] ?
Is it doable?

greetings,
emily

What shell are you using where:

GREP="qcd30"
GREP=""QCD50"
for file in "${GREP[@]}"

creates an array rather than setting grep to the value:

QCD50
for file in 

Even if you get rid of the double double-quote at the start of the second assignment to GREP, GREP still would not be an array.
Have you considered using:

find top_directory -type d

to generate your list of directories?

Hi Don,
sorry I was mistaken.
I used it following way and it is working fine (snippet of the script):

 
GREP[1]="QCD30"
GREP[2]="QCD50"

for FileNameIndx in "${GREP[@]}"
      do
      echo 'crab ntuplize_crab -getoutput/status -c ' $FileNameIndx
      crab ntuplize_crab -getoutput -c $FileNameIndx
done

I did try your command, but when I do not have pattern of the directories then I am not sure how it would help me.
Besides, I have some directories which I do not want in the same directory.

I am using that awk command, but I do not understand as how the piece with NF and p works.

       jobNo=$(awk '/Jobs with Wrapper/ && $NF != 0{s=1}   /List of jobs/ && s{if(p){p=p","$NF}else{p=$NF};s=""}END{print p}' $FileNameIndx"_status.log" )
  
 $NF != 0{s=1}  does it say that NF variable should not be 0 and should be true when "s" string is true.

 s{if(p){p=p","$NF}else{p=$NF};s=""}  this part confuses me.

Can you help with this please.

emily,

Here is a copy of your awk script reformatted with comments added:

awk '
/Jobs with Wrapper/ && $NF != 0 {
        # We have found a "Jobs with Wrapper" line with a non-zero Exit Code;
        # set s to a non-zero, non-empty string value.  (This indicates that we
        # need to save jobs from the next "List of Jobs" line.)
        s=1
}       
/List of jobs/ && s {
        # We have a "List of Jobs" line and we need to add this list to our saved
        # jobs output list.
        if(p)   # Our current list is not empty; add a comma and this line's job
                # list to our saved jobs output list.
                p=p","$NF
        else    # Our current list is empty; initialize our saved jobs output list.
                p=$NF
        # Clear the indicator saying that we should add jobs from the next
        # "List of Jobs" line.
        s=""
}       
END {   # Print the accumulated non-zero exit code saved jobs output list.
        print p
}

Does this help you understand what your script is doing?

1 Like

Thanks Don, it definitely does..:slight_smile:

---------- Post updated at 08:28 AM ---------- Previous update was at 08:25 AM ----------

Hi again,
Would you please reply for this mail thread: 'AND' boolean not working !!!!
I modified this script for better use, but there is issue with AND boolean..

I appreciate your kindness,
emily