How to strip some characters before putting in array?

ken6503 · January 13, 2015, 4:08pm

Hi Gurus,

my current code like below:

 nawk '{f1 = (NF>1)?$1:""}{print f1, $NF}'|sed -e 's/s(/,/g;s/)//g;s/ *,/,/'|nawk -F"," '{a[$1]b[$2]}END{for (i in b) if (!(i in a))print i}'

I have file like below. (this is autosys job dependencies)
the job with s() is dependencies, the job without s() is job need to be run. my task is to find all dependency jobs which not in the box. some of the above code I got from the folks in this fourm.

I am wondering if it is possible to remove

sed -e 's/s(/,/g;s/)//g;s/ *,/,/'

and in second awk, I can strip s() and put only job name in the array.

deptnm-appnm-code     -------     s(deptnm-ocode-30-ddd)
                                  s(deptnm-ocode-00-dum)
                                  s(deptnm-appnm-ecode)
deptnm-appnm-code-dld -------     s(deptnm-on-rundt-run)
                                  s(deptnm-appnm-ocodel-su)
                                  s(deptnm-appnm-ecode-dld)
                                  s(deptnm-ocode-50-curcnt)
deptnm-appnm-code-dum -------     s(deptnm-on-rundt-bp)

thanks in advance.

shamrock · January 13, 2015, 4:21pm

Post a sample of the input and desired output...because imo this can all be done with a single nawk without the need for a pipe...

ken6503 · January 13, 2015, 4:40pm

Thanks for your reply, shamrock,

my input like below:

deptnm-appnm-code     -------     s(deptnm-ocode-30-ddd)
                                  s(deptnm-ocode-00-dum)
deptnm-appnm-ocodel-su   ------- s(deptnm-appnm-ecode)
deptnm-appnm-code-dld -------     s(deptnm-on-rundt-run)
                                  s(deptnm-appnm-ocodel-su)
                                  s(deptnm-appnm-ecode-dld)
                                  s(deptnm-ocode-50-curcnt)
deptnm-appnm-code-dum -------     s(deptnm-on-rundt-bp)

my expected output is

deptnm-ocode-30-ddd
deptnm-ocode-00-dum
deptnm-appnm-ecode
deptnm-on-rundt-run
deptnm-appnm-ecode-dld
deptnm-ocode-50-curcnt
deptnm-on-rundt-bp

list all jobs in right side, not exist in left side.

disedorgue · January 13, 2015, 6:59pm

Hello,
You can try (work with gawk --posix):

nawk 'gsub(/[s()]/,"") && NF == 3 {s[$1]=1;d[a++]=$3;next};{d[a++]=$1};END{while(i<a) if (!s[d[i++]]) print d[i-1] }' input-file

ken6503 · January 13, 2015, 9:58pm

Hi disedorgue,

thanks for your reply,
the code works. there is a little thing

gsub(/[s()]/,"")

replace all "s" in the file. example:

deptnm-appnm-ocodel-su

become

deptnm-appnm-ocodel-u

is there any way I can replace "s(" as whole?

thanks in advance.

---------- Post updated at 09:58 PM ---------- Previous update was at 09:09 PM ----------

ken6503:

Hi disedorgue,

thanks for your reply,
the code works. there is a little thing
gsub(/[s()]/,"")
replace all "s" in the file. example:
deptnm-appnm-ocodel-su
become
deptnm-appnm-ocodel-u
is there any way I can replace "s(" as whole?

thanks in advance.

I found the command
gsub(/s$|$/,"")

RavinderSingh13 · January 13, 2015, 10:16pm

Hello ken6503,

Could you please try following, it may help you.

awk 'FNR==NR{if(NF>1){A[$1]=$1;gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF} else {gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF}} END{for(j in A){delete B[j]}for(u in B){print B}}' Input_file Input_file

Output will be as follows.

deptnm-ocode-00-dum
deptnm-appnm-ecode
deptnm-on-rundt-bp
deptnm-on-rundt-run
deptnm-appnm-ecode-dld
deptnm-ocode-30-ddd
deptnm-ocode-50-curcnt

EDIT: Adding a non one liner form for same.

awk 'FNR==NR{
                if(NF>1){
                                A[$1]=$1;
                                gsub(/s\(/,X,$NF);
                                gsub(/\)$/,X,$NF);
                                B[$NF]=$NF
                        }
                else    {
                                gsub(/s\(/,X,$NF);
                                gsub(/\)$/,X,$NF);
                                B[$NF]=$NF
                        }
             }
     END     {
                for(j in A){
                                delete B[j]
                           }
                for(u in B){
                                print B
                           }
             }
    ' Input_file Input_file

Thanks,
R. Singh

Scrutinizer · January 13, 2015, 11:41pm

How about:

awk 'NR==FNR{if(NF>1)A[$1]; next} !($2 in A){print $2}' input-file FS='[)(]' input-file

eptnm-ocode-30-ddd
deptnm-ocode-00-dum
deptnm-appnm-ecode
deptnm-on-rundt-run
deptnm-appnm-ecode-dld
deptnm-ocode-50-curcnt
deptnm-on-rundt-bp

ken6503 · January 14, 2015, 11:10am

Thanks disedorgue.

would you please give me brief explanation about

 if (!s[d[i++]]) print d[i-1]

thanks again.

---------- Post updated at 11:10 AM ---------- Previous update was at 10:32 AM ----------

ravindersingh13:

Hello ken6503,

Could you please try following, it may help you.

awk 'FNR==NR{if(NF>1){A[$1]=$1;gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF} else {gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF}} END{for(j in A){delete B[j]}for(u in B){print B}}' Input_file Input_file

Output will be as follows.

deptnm-ocode-00-dum
deptnm-appnm-ecode
deptnm-on-rundt-bp
deptnm-on-rundt-run
deptnm-appnm-ecode-dld
deptnm-ocode-30-ddd
deptnm-ocode-50-curcnt

EDIT: Adding a non one liner form for same.

awk 'FNR==NR{
   if(NF>1){
   A[$1]=$1;
   gsub(/s\(/,X,$NF);
   gsub(/\)$/,X,$NF);
   B[$NF]=$NF
   }
   else    {
   gsub(/s\(/,X,$NF);
   gsub(/\)$/,X,$NF);
   B[$NF]=$NF
   }
   }
   END     {
   for(j in A){
   delete B[j]
   }
   for(u in B){
   print B
   }
   }
   ' Input_file Input_file

Thanks,
R. Singh

Thanks R.Singh.

the code works.

I have some questions about this code.

what is the difference between "gsub(/s\(/,X,$NF)" and "gsub(/s\(/,"",$NF)"
Is it necessary to read the file twice? in the code, think it only execute NR==FNR.
I tried below code, the results are same

awk 'FNR==NR{if(NF>1){A[$1]=$1;gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF} else {gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF}} END{for(j in A){delete B[j]}for(u in B){print B}}' Input_file

RavinderSingh13 · January 14, 2015, 12:48pm

ken6503:

Thanks R.Singh.
the code works.

I have some questions about this code.

what is the difference between "gsub(/s$/,X,$NF)" and "gsub(/s\(/,"",$NF)"

Is it necessary to read the file twice? in the code, think it only execute NR==FNR.
I tried below code, the results are same
awk 'FNR==NR{if(NF>1){A[$1]=$1;gsub(/s\(/,X,$NF);gsub(/$$/,X,$NF);B[$NF]=$NF} else {gsub(/s$/,X,$NF);gsub(/$$/,X,$NF);B[$NF]=$NF}} END{for(j in A){delete B[j]}for(u in B){print B}}' Input_file

Hello Ken6503,

For 1st query:
gsub(/s$/,X,$NF) means it will substitute strings (/ and $ with NULL value. (I have used \(/ because (,) are meta characters and to
tell awk not to take their special meaning and take characters as it is we use escape character \ )

For 2nd query:
Thank you for pointing out same, Yes you are right we can we can read the file once also, so no need to put FNR==NR condition in it,
so code can be reduce to as follows.(Not tested in different scenarios)

awk '{if(NF>1){A[$1]=$1;gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF} else {gsub(/s\(/,X,$NF);gsub(/\)$/,X,$NF);B[$NF]=$NF}} END{for(j in A){delete B[j]}for(u in B){print B}}' Input_file

Output will be as follows.

deptnm-ocode-00-dum
deptnm-appnm-ecode
deptnm-on-rundt-bp
deptnm-on-rundt-run
deptnm-appnm-ecode-dld
deptnm-ocode-30-ddd
deptnm-ocode-50-curcnt

Thanks,
R. Singh

disedorgue · January 14, 2015, 1:47pm

if (!s[d[i++]]) print d[i-1] print d[i-1] if hash s[d[i\+\+]] doesn't exist.
s is array with hash index where hash is field 1.
d is array with number index and contains field 3.
I use array with number index to preserve entry order.

shamrock · January 14, 2015, 3:22pm

Try this nawk one-liner...

nawk -F"[()]" '{print $2}' file