Fix script to get missing information

Gents,

Can you please help me to fix the following script in order to get complete data as desired. I am missing some data in output.

the complete input file is attached.

The script I am using is

awk '{\
       status=substr($0,91,2)\
       ind=substr($0,26,1);\
       split(substr($0, 11, 24-11), val,/\.0 /);\
       key[NR%2":"val[1]val[2]] = ind":"status\
     }\
     END{\
       for(i in key){\
         split(i, arr, /:/)\
         split(key, brr, /:/)\
         if(!(brr[2] == '$st1' || brr[2] == '$st2')){\
           delete key\
           continue\
         }\
         cnt[arr[2]]++\
      }\
      for(i in key){\
        split(i, arr, /:/)\
        split(key, brr, /:/)\
        print arr[2]" index "brr[1]" has "cnt[arr[2]]" times status "brr[2]\
      }\
    }' input.txt | sort -k1,1n -u > output

Output I get

6936919969 index 2 has 2 times status 14
6937919401 index 3 has 2 times status 14
6938720105 index 2 has 2 times status 14
6957719489 index 4 has 2 times status 98
6957919489 index 2 has 2 times status 98
6958119529 index 2 has 2 times status 14
6958320209 index 2 has 2 times status 14
6958719737 index 2 has 2 times status 14
6958920185 index 2 has 2 times status 14
6959120009 index 1 has 1 times status 98
6959320089 index 1 has 1 times status 98

Output I will like to get

6936919969 index 2 has 2 times status 14
6937919401 index 2 has 2 times status 14
6937919401 index 3 has 2 times status 14
6938720105 index 2 has 2 times status 14
6957719489 index 1 has 2 times status 98
6957719489 index 2 has 2 times status 98
6957719489 index 3 has 2 times status 98
6957719489 index 4 has 2 times status 98
6957919489 index 1 has 2 times status 98
6957919489 index 2 has 2 times status 98
6958119529 index 2 has 2 times status 14
6958320209 index 2 has 2 times status 14
6958719737 index 2 has 2 times status 14
6958920185 index 2 has 2 times status 14
6959120009 index 1 has 1 times status 98
6959320089 index 1 has 1 times status 98

Thanks for your help

What is or refers to '$st1' and '$st2' ?

Dear RudiC
Sorry they are variables

$st1=14
$st2=98

Remove the option -u from sort

sort -k1,1n -u > output

I've rarely seen a file and a logic as strange as this ones. In principle, you are overwriting the array elements as your index is not unique. You could try to make it unique, and then increment the array value each time you encounter the index.

1 Like

Dear RudiC,

Please can you help me to get the desired output. from the input file. I try to get it myself but I cant . Thanks for your help

---------- Post updated at 08:27 AM ---------- Previous update was at 08:26 AM ----------

Dear RudiC,

Please can you help me to get the desired output. from the input file. I try to get it myself but I cant . Thanks for your help

---------- Post updated at 09:32 AM ---------- Previous update was at 08:27 AM ----------

Hi Thanks it don't help

I largely reduced the logics, maybe a bit too far, but it seems to do what you want and gets way more output lines than defined in the sample. Try

awk     '       {status=substr($0,91,2)
                 ind=substr($0,26,1)
                 split(substr($0, 11, 24-11), val,/\.0 /)
                 key[val[1]val[2]":"ind":"status]++
                 }
         END    {for(i in key)  {split(i, brr, /:/)
                                 if ((brr[3] == 14) || (brr[3] == 98))
                                        print brr[1]" index "brr[2]" has "key" times status " brr[3]
                }
        ' /tmp/input.txt | sort -k1,1n
6936119601 index 1 has 2 times status 98
6936919969 index 2 has 2 times status 14
6937319385 index 1 has 2 times status 98
6937719401 index 1 has 2 times status 98
6937719401 index 2 has 2 times status 98
6937919401 index 2 has 2 times status 14
6937919401 index 3 has 2 times status 14
6938119313 index 1 has 2 times status 98
6938519977 index 1 has 1 times status 14
.
.
.
6960519681 index 1 has 2 times status 98
6960519681 index 2 has 2 times status 98
6960519681 index 3 has 2 times status 98
1 Like

Dear RudiC.
Thanks a lot for your help. it works fine but list all values which contents the status 14 and 98.
The work here is to get verify that the last index

 ind=substr($0,26,1)

has a value of 1, if not them the script should list
Example

A         69373.0 19385.013 3  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0    90 10398  
A         69373.0 19385.013 6  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0    90 10398  
A         69373.0 19385.023 3 75   1   4163770 76 45 26 269519.7 2861274.8  -6.6    90 103 1  
A         69373.0 19385.023 6 75   1  -3132671 78 43 26 269527.1 2861262.8   3.8    90 103 1  

Here we see that

status=substr($0,91,2

in index 2 is = 1, then we should don't list the status 98 for this

 substr($0, 11, 24-11)

.

but in this example:

A         69387.0 20105.01310 75   1  -3122871 77 43 22 284941.6 2870579.3  26.6  1097 103 1  
A         69387.0 20105.013 3 75   1  -4173371 75 64 27 284930.0 2870570.5  26.6  1097 103 1  
A         69387.0 20105.02310 75   0   0 0 0 0  0  0  0      0.0       0.0   0.0  1097 10314  
A         69387.0 20105.023 3 75   0   0 0 0 0  0  0  0      0.0       0.0   0.0  1097 10314  

has the last index have status = 14, then it should appear in the output list

In the same case of that all index have values 98 or 14 the list should list all.
example

A         69577.0 19489.01410  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   215 10498  
A         69577.0 19489.014 7  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   215 10498  
A         69577.0 19489.02410  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   215 10498  
A         69577.0 19489.024 7  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   215 10498  
A         69577.0 19489.03410  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   215 10498  
A         69577.0 19489.034 7  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   215 10498  
A         69577.0 19489.04410  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   215 10498  
A         69577.0 19489.044 7  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   215 10498  
A         69577.0 19529.01410  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   275 10498  
A         69577.0 19529.014 7  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   275 10498  
A         69577.0 19529.02410  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   275 10498  
A         69577.0 19529.024 7  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0   275 10498  

Here all index from 1 to 4 have status = 98,

I have a big list and if you notice the value

substr($0, 11, 24-11)

, is repeated 2 times = index 1, then again 2 times and the index increase to = 2, and ++

The goal here is to verify if the last index have code = 1, if not the case we need to list the errors.

For that from the input I should get this only.

6936919969 index 2 has 2 times status 14
6937919401 index 2 has 2 times status 14
6937919401 index 3 has 2 times status 14
6938720105 index 2 has 2 times status 14
6957719489 index 1 has 2 times status 98
6957719489 index 2 has 2 times status 98
6957719489 index 3 has 2 times status 98
6957719489 index 4 has 2 times status 98
6957919489 index 1 has 2 times status 98
6957919489 index 2 has 2 times status 98
6958119529 index 2 has 2 times status 14
6958320209 index 2 has 2 times status 14
6958719737 index 2 has 2 times status 14
6958920185 index 2 has 2 times status 14
6959120009 index 1 has 1 times status 98
6959320089 index 1 has 1 times status 98
A         69593.0 20089.01410 75   1  -3112772 76 56 27 282018.5 2874825.8  17.6  1102 104 1   
A         69593.0 20089.014 7  0   0   0 0 0 0  0  0  0      0.0       0.0   0.0  1102 10498  

In this case as there is one index and have one value as status 98 , then I list it.

Thanks a lot for your time and help .

Sorry, I don't get it.

1 Like

Thanks for all your help.

One thing more here

How I can do something to get only the values which have the same status all time (only 14 and 98)..

example for the input list I should get only:

6957719489 all records have status 98
6957919489 all records have status 98

Thanks

But you have that:

                                 if ((brr[3] == 14) || (brr[3] == 98))
                                         print brr[1]" index "brr[2]" has "key" times status " brr[3] 

There are ONLY lines with status 14 / 98 in the output.