Help in awk to read the common txt

Dear all,
I have small script which seems to be working but seems to have some bug.
It suppose to read commonTxt and then print the noOfLines in outputFile.
It is working for most of the txt but unable to add some of the variables values.
Can somebody please spend looking at the thread and reply.

Thanks in advance,

Script is following

#!/bin/bash                                                                                                                                           
date
FileName=DataFileName
LINES4LOG=38
TEXT=Procss

MakeFinalLog() {
    echo '=====Making Final Log ====='
    awk '{if($0~text){p=lines;next}}                                                                                                                 
                                                                                                                                                      
p>0{                                                                                                                                                  
       split($0,arr,"=")                                                                                                                              
       if(!h[arr[1]"HDR"])h[arr[1]"HDR"]=arr[1]                                                                                                       
       a[arr[1]]=a[arr[1]]" "$NF                                                                                                                      
       if(j<lines) b[++j]=arr[1]                                                                                                                      
       sum[arr[1]]+=$NF                                                                                                                               
       p--                                                                                                                                            
}                                                                                                                                                     
END{                                                                                                                                                  
 for(i=1;i<=j;i++)                                                                                                                                    
     print h[b"HDR"]"="a[b]" "sum                                                                                                         
[b]}' lines=$LINES4LOG text="$TEXT" *.list > $FileName"_log.txt"
echo '++++++ Done ++++++  '$FileName"_log.txt"
}

if [ "$1" = "output" ]; then
    SetEnv
    MakeFinalLog
    exit 0
fi

And here is the OutPutFile (DataFileName_log.txt), and as you can see the text in red is not being added up, I am not sure why.

DataFileName_log.txtSample Count  = 0 0 0
nPU weighted   = 441063 441530 882593
Pass vtx trk   = 442356 442358 884714
GenLevel       = 0 0 0
Pass   HLT     = 442356 442358 884714
   eff = inf 0.00493042 0.812471 inf 0.00483545 0.804582 1.62682
          = === === === === 0
 Pass Spike cut       = 649724 648483 1298207
 Pass Eta cut         = 623956 622832 1246788
 Pass Pt cut          = 185356 184933 370289
Pass EleID     = 60081 59976 120057
Passed DiLep   = 2181 2139 4320
   eff = inf 0.00493042 0.812471 inf 0.00483545 0.804582 1.62682
Pass Z mass    = 1865 1813 3678
 Fail GenEle match = 117 115 232
Passed Zs      = 1772 1721 3493
   eff = inf 0.00493042 0.812471 inf 0.00483545 0.804582 1.62682
Passed ZsW     = 8.10845e+06 8.03766e+06 16146110
Passed ZsW2    = 0 0 0
Sum LepID      = 7.99145e+07 7.76145e+07 157529000
Sum Pileup     = 7.95332e+07 7.88391e+07 158372300
          = === === === === 0
Spike                                      : 2089= 2089 2089
pt                                          : 1281= 1281 1281
pt, eta                                     : 1169= 1169 1169
dR                                          : 1080= 1080 1080
EleVeto                                     : 1029= 1029 1029
EleVeto, HoE                                : 752= 752 752
EleVeto, HoE, sie                           : 197= 197 197
EleVeto, HoE, sie, ChargdHad                : 10= 10 10
EleVeto, HoE, sie, ChargdHad, NeuHad        : 9= 9 9 18
EleVeto, HoE, sie, ChargdHad, NeuHad, phopf : 8= 8 8 16
----------------------------------------------------------= ---------------------------------------------------------- ---------------------------------------------------------- 0
PhoID pass events = 8 8 16
Passed events  = 7 8 15
  eff = 0.00395034 0.00464846 0.0085988
Passed PU evt  = 7.40527 6.24361 13.6489
Passed Yields  = 0.75497 0.636537 1.39151

Here are the two list files that I m trying to read
File1.list

Procss: TTBb_0
Sample Count  = 0
nPU weighted   = 441063
Pass vtx trk   = 442356
GenLevel       = 0
Pass   HLT     = 442356
   eff = inf
          =====Electron Cut Efficiency ===
 Pass Spike cut       = 649724
 Pass Eta cut         = 623956
 Pass Pt cut          = 185356
Pass EleID     = 60081
Passed DiLep   = 2181
   eff = 0.00493042
Pass Z mass    = 1865
 Fail GenEle match = 117
Passed Zs      = 1772
   eff = 0.812471
Passed ZsW     = 8.10845e+06
Passed ZsW2    = 0
Sum LepID      = 7.99145e+07
Sum Pileup     = 7.95332e+07
          =====Pho Cut Efficiency ===
Spike                                      : 2089
pt                                          : 1281
pt, eta                                     : 1169
dR                                          : 1080
EleVeto                                     : 1029
EleVeto, HoE                                : 752
EleVeto, HoE, sie                           : 197
EleVeto, HoE, sie, ChargdHad                : 10
EleVeto, HoE, sie, ChargdHad, NeuHad        : 9
EleVeto, HoE, sie, ChargdHad, NeuHad, phopf : 8
----------------------------------------------------------
PhoID pass events = 8
Passed events  = 7
  eff = 0.00395034
Passed PU evt  = 7.40527
Passed Yields  = 0.75497

File2.list

Procss: TTBb_0
Sample Count  = 0
nPU weighted   = 441530
Pass vtx trk   = 442358
GenLevel       = 0
Pass   HLT     = 442358
   eff = inf
          =====Electron Cut Efficiency ===
 Pass Spike cut       = 648483
 Pass Eta cut         = 622832
 Pass Pt cut          = 184933
Pass EleID     = 59976
Passed DiLep   = 2139
   eff = 0.00483545
Pass Z mass    = 1813
 Fail GenEle match = 115
Passed Zs      = 1721
   eff = 0.804582
Passed ZsW     = 8.03766e+06
Passed ZsW2    = 0
Sum LepID      = 7.76145e+07
Sum Pileup     = 7.88391e+07
          =====Pho Cut Efficiency ===
Spike                                      : 2077
pt                                          : 1304
pt, eta                                     : 1198
dR                                          : 1095
EleVeto                                     : 1038
EleVeto, HoE                                : 729
EleVeto, HoE, sie                           : 199
EleVeto, HoE, sie, ChargdHad                : 12
EleVeto, HoE, sie, ChargdHad, NeuHad        : 9
EleVeto, HoE, sie, ChargdHad, NeuHad, phopf : 8
----------------------------------------------------------
PhoID pass events = 8
Passed events  = 8
  eff = 0.00464846
Passed PU evt  = 6.24361
Passed Yields  = 0.636537

1st: Note that you're splitting lines on "=" characters and that the array sum[] is the sum of the numeric value of last field on corresponding lines from each file.

2nd: Note that the lines you've marked in red (plus the two lines following them) have no "=" characters.

If you change the ":" characters on those lines in both input files to "=" characters, you get:

Sample Count  = 0 0 0
nPU weighted   = 441063 441530 882593
Pass vtx trk   = 442356 442358 884714
GenLevel       = 0 0 0
Pass   HLT     = 442356 442358 884714
   eff = inf 0.00493042 0.812471 inf 0.00483545 0.804582 inf
          = === === === === 0
 Pass Spike cut       = 649724 648483 1298207
 Pass Eta cut         = 623956 622832 1246788
 Pass Pt cut          = 185356 184933 370289
Pass EleID     = 60081 59976 120057
Passed DiLep   = 2181 2139 4320
   eff = inf 0.00493042 0.812471 inf 0.00483545 0.804582 inf
Pass Z mass    = 1865 1813 3678
 Fail GenEle match = 117 115 232
Passed Zs      = 1772 1721 3493
   eff = inf 0.00493042 0.812471 inf 0.00483545 0.804582 inf
Passed ZsW     = 8.10845e+06 8.03766e+06 16146110
Passed ZsW2    = 0 0 0
Sum LepID      = 7.99145e+07 7.76145e+07 157529000
Sum Pileup     = 7.95332e+07 7.88391e+07 158372300
          = === === === === 0
Spike                                      = 2089 2077 4166
pt                                          = 1281 1304 2585
pt, eta                                     = 1169 1198 2367
dR                                          = 1080 1095 2175
EleVeto                                     = 1029 1038 2067
EleVeto, HoE                                = 752 729 1481
EleVeto, HoE, sie                           = 197 199 396
EleVeto, HoE, sie, ChargdHad                = 10 12 22
EleVeto, HoE, sie, ChargdHad, NeuHad        = 9 9 18
EleVeto, HoE, sie, ChargdHad, NeuHad, phopf = 8 8 16
----------------------------------------------------------= ---------------------------------------------------------- ---------------------------------------------------------- 0
PhoID pass events = 8 8 16
Passed events  = 7 8 15
  eff = 0.00395034 0.00464846 0.0085988
Passed PU evt  = 7.40527 6.24361 13.6489

which I am guessing is closer to what you expected.

Hi Don,
Sorry for replying late. Was busy with some other task.

And Big Thanks for the reply.:slight_smile: it worked pretty well. But I am surprise I had total no of files as 48 but it did not give me the information of all 48 files..:confused::confused:

Are you seeing any obvious reason for that? Please let me know

Greetings
emily

Some details about what it did give you might help:

  1. Did it give you all of the data you wanted for a subset of the files?
  2. Did it give you all of the data you wanted from some lines from all files, but not for other lines?
  3. What is the output on your system from the command getconf LINE_MAX ?
  4. What is the output from the command ls -l *.list ?
  5. What is the output from the command?:
    text awk 'FNR==1{if(m)printf("%6d\t%s\n",m,f) sm+=m+1;f=FILENAME;m=length($0);n++} length($0)>m{m=length($0)} END{printf("%6d\t%s\n%6d\tTotal for %d files\n",++m,f,sm+m,n)}' *.list