How to strip some characters before putting in array?

Hi Gurus,

my current code like below:

 nawk '{f1 = (NF>1)?$1:""}{print f1, $NF}'|sed -e 's/s(/,/g;s/)//g;s/ *,/,/'|nawk -F"," '{a[$1]b[$2]}END{for (i in b) if (!(i in a))print i}'

I have file like below. (this is autosys job dependencies)
the job with s() is dependencies, the job without s() is job need to be run. my task is to find all dependency jobs which not in the box. some of the above code I got from the folks in this fourm.

I am wondering if it is possible to remove

sed -e 's/s(/,/g;s/)//g;s/ *,/,/' 

and in second awk, I can strip s() and put only job name in the array.

deptnm-appnm-code     -------     s(deptnm-ocode-30-ddd)
                                  s(deptnm-ocode-00-dum)
                                  s(deptnm-appnm-ecode)
deptnm-appnm-code-dld -------     s(deptnm-on-rundt-run)
                                  s(deptnm-appnm-ocodel-su)
                                  s(deptnm-appnm-ecode-dld)
                                  s(deptnm-ocode-50-curcnt)
deptnm-appnm-code-dum -------     s(deptnm-on-rundt-bp)

thanks in advance.

Post a sample of the input and desired output...because imo this can all be done with a single nawk without the need for a pipe...

1 Like

Thanks for your reply, shamrock,

my input like below:

deptnm-appnm-code     -------     s(deptnm-ocode-30-ddd)
                                  s(deptnm-ocode-00-dum)
deptnm-appnm-ocodel-su   ------- s(deptnm-appnm-ecode)
deptnm-appnm-code-dld -------     s(deptnm-on-rundt-run)
                                  s(deptnm-appnm-ocodel-su)
                                  s(deptnm-appnm-ecode-dld)
                                  s(deptnm-ocode-50-curcnt)
deptnm-appnm-code-dum -------     s(deptnm-on-rundt-bp)

my expected output is

deptnm-ocode-30-ddd
deptnm-ocode-00-dum
deptnm-appnm-ecode
deptnm-on-rundt-run
deptnm-appnm-ecode-dld
deptnm-ocode-50-curcnt
deptnm-on-rundt-bp

list all jobs in right side, not exist in left side.

Hello,
You can try (work with gawk --posix):

nawk 'gsub(/[s()]/,"") && NF == 3 {s[$1]=1;d[a++]=$3;next};{d[a++]=$1};END{while(i<a) if (!s[d[i++]]) print d[i-1] }' input-file
1 Like

Hi disedorgue,

thanks for your reply,
the code works. there is a little thing

gsub(/[s()]/,"")

replace all "s" in the file. example:

deptnm-appnm-ocodel-su

become

deptnm-appnm-ocodel-u

is there any way I can replace "s(" as whole?

thanks in advance.

---------- Post updated at 09:58 PM ---------- Previous update was at 09:09 PM ----------

I found the command
gsub(/s\(|\)/,"")

Hello ken6503,

Could you please try following, it may help you.

awk 'FNR==NR{if(NF>1){A[$1]=$1;gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF} else {gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF}} END{for(j in A){delete B[j]}for(u in B){print B}}' Input_file Input_file

Output will be as follows.

deptnm-ocode-00-dum
deptnm-appnm-ecode
deptnm-on-rundt-bp
deptnm-on-rundt-run
deptnm-appnm-ecode-dld
deptnm-ocode-30-ddd
deptnm-ocode-50-curcnt

EDIT: Adding a non one liner form for same.

awk 'FNR==NR{
                if(NF>1){
                                A[$1]=$1;
                                gsub(/s\(/,X,$NF);
                                gsub(/\)$/,X,$NF);
                                B[$NF]=$NF
                        }
                else    {
                                gsub(/s\(/,X,$NF);
                                gsub(/\)$/,X,$NF);
                                B[$NF]=$NF
                        }
             }
     END     {
                for(j in A){
                                delete B[j]
                           }
                for(u in B){
                                print B
                           }
             }
    ' Input_file Input_file

Thanks,
R. Singh

1 Like

How about:

awk 'NR==FNR{if(NF>1)A[$1]; next} !($2 in A){print $2}' input-file FS='[)(]' input-file 
eptnm-ocode-30-ddd
deptnm-ocode-00-dum
deptnm-appnm-ecode
deptnm-on-rundt-run
deptnm-appnm-ecode-dld
deptnm-ocode-50-curcnt
deptnm-on-rundt-bp

Thanks disedorgue.

would you please give me brief explanation about

 if (!s[d[i++]]) print d[i-1]

thanks again.

---------- Post updated at 11:10 AM ---------- Previous update was at 10:32 AM ----------

Thanks R.Singh.

the code works.

I have some questions about this code.

  1. what is the difference between "gsub(/s\(/,X,$NF)" and "gsub(/s\(/,"",$NF)"
  2. Is it necessary to read the file twice? in the code, think it only execute NR==FNR.
    I tried below code, the results are same
awk 'FNR==NR{if(NF>1){A[$1]=$1;gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF} else {gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF}} END{for(j in A){delete B[j]}for(u in B){print B}}' Input_file
1 Like

Hello Ken6503,

For 1st query:
gsub(/s\(/,X,$NF) means it will substitute strings (/ and \) with NULL value. (I have used \(/ because (,) are meta characters and to
tell awk not to take their special meaning and take characters as it is we use escape character \ )

For 2nd query:
Thank you for pointing out same, Yes you are right we can we can read the file once also, so no need to put FNR==NR condition in it,
so code can be reduce to as follows.(Not tested in different scenarios)

awk '{if(NF>1){A[$1]=$1;gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF} else {gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF}} END{for(j in A){delete B[j]}for(u in B){print B}}' Input_file

Output will be as follows.

deptnm-ocode-00-dum
deptnm-appnm-ecode
deptnm-on-rundt-bp
deptnm-on-rundt-run
deptnm-appnm-ecode-dld
deptnm-ocode-30-ddd
deptnm-ocode-50-curcnt

Thanks,
R. Singh

1 Like

if (!s[d[i++]]) print d[i-1] print d[i-1] if hash s[d[i\+\+]] doesn't exist.
s is array with hash index where hash is field 1.
d is array with number index and contains field 3.
I use array with number index to preserve entry order.

1 Like

Try this nawk one-liner...

nawk -F"[()]" '{print $2}' file