generate level numbers

pbsrinivas · September 19, 2007, 7:05am

Hi...

I have a sequence of jobs and its predecessors..

Input

Job_Name Predecessor
A NULL
B1 A
B2 A
B3 B1
C B3
C B2

So based on these i have to generate the level Number
What i mean is

Let A be level 1
for B1 to happen it should have done A
so B1 level is A+1 = 1+1 = 2
and same for B2 it should have done A
so B2 level is A+1 = 1+1 = 2
for B3 to happen it should have done B1
so B3 level is B2+1 = 2+1 = 3

but for C to complete it should have done B2 and B3
so C is max(B2,B3)+1
C=max(2,3)+1=4

so the final out put should be

A NULL 1
B1 A 2
B2 A 2
B3 B1 3
C B3 4
C B2 4

I am not able to code the logic..

Please Help..

Ygor · September 19, 2007, 9:22pm

Try...

$ cat file1
A NULL
B1 A
B2 A
B3 B1
C B3
C B2

$ awk '{print $0, a[$1]=1+a[$2]}' file1
A NULL 1
B1 A 2
B2 A 3
B3 B1 3
C B3 4
C B2 3

pbsrinivas · September 20, 2007, 1:28am

As i told for C the problem comes

It should be max for the both..

That is
For C to complete it should have done B2 and B3

so C is max(B2,B3)+1

C=max(2,3)+1=4

but

this give both 3 and 4 but it should be max of it..

I can give u one more input file

A 0
B1 A
B2 A
B3 B1
C B3
C B2
D C
E C
F D
F E
I E

This is extention of previous

when i run the awk i get

A 0 1
B1 A 2
B2 A 2
B3 B1 3
C B3 4
C B2 3---- as it 3 here next one takes as 4 but it shoul be at level 5 ..
D C 4
E C 4
F D 5
F E 5
I E 5

Perderabo · September 20, 2007, 2:28pm

How about:

$ cat data1
A NULL
B1 A
B2 A
B3 B1
C B3
C B2
$ awk '{r[NR]=$0 ; k[NR]=$1 ; if(v[$1]<1+v[$2]) v[$1]=1+v[$2] } END { for (n=1; n<=NR; n++) print r[n] , v[k[n]]}' data1
A NULL 1
B1 A 2
B2 A 2
B3 B1 3
C B3 4
C B2 4
$

pbsrinivas · September 21, 2007, 1:59am

Hi...

Thanks a lot for all the idea's

Its almost done... but .....

I am doing this to generate the level Numbers for Jobs to run in the unix server.. The input we have is the job name and it Predecessor Job Name

Small change required..

Have a look at the picture i have inserted..

If the Input is like this

A 0
B 0
C 0
D 0
E A
E B
E C
F E
G E
H E
I G
Z H
Z D

its perfect and works great..

But the same input and change in order..

A 0
B 0
C 0
D 0
E A
E B
E C
Z H
Z D
F E
G E
H E
I G

currupts the output.. gets me

A 0 1
B 0 1
C 0 1
D 0 1
E A 2
E B 2
E C 2
Z H 2
Z D 2
F E 3
G E 3
H E 3
I G 4

Just for the Pictorial view see the file i uploaded...
we have almost 1200 Jobs and it very tedious todo it..

Please Help...

Expected Out put
A 0 1
B 0 1
C 0 1
D 0 1
E A 2
E B 2
E C 2
Z H 4
Z D 4
F E 3
G E 3
H E 3
I G 4

We also have a file where it shows the Job and its Successor..

IF it could be of any use...

It like this

A E
B E
C E
D Z
E F
E G
E H
G I
H Z
I NULL
Z NULL

Perderabo · September 21, 2007, 12:26pm

This will never work unless you sort the input so that each job comes after all of its predecessors. The data pairs you have represent a directed graph. Provided that that the directed graph is acyclic, such a sort should be possible and this is called a topological sort. Lucky for you, Unix has a program called tsort to do this. Try:

#! /usr/bin/ksh
sort $1 > temp1
tsort temp1 | sed '$d' | nl | sort -nr | cut -f2 | nl | awk '{print $2, $1}' | sort | join temp1 - | sort -k 3n | awk '{print $1,$2}' |\
awk '{r[NR]=$0 ; k[NR]=$1 ; if(v[$1]<1+v[$2]) v[$1]=1+v[$2] } END { for (n=1; n<=NR; n++) print r[n] , v[k[n]]}'
exit 0

But bear in mind that with a complex directed graph there will probably be several different ways to assign your level numbers. This will find one of them.

pbsrinivas · September 22, 2007, 12:20am

Thanks a lot for the inputs..

Its working great.. but a small problem..

if we change the name of NODE "E" by P is does some thing unexpected..

like

just the same in put but change E as P

A 0
B 0
C 0
D 0
P A
P B
P C
Z H
Z D
F P
G P
H P
I G

the ouput i get.. (---- I have chaged ur script as $3 instead of $2....

sort $1 > temp1
tsort temp1 | sed '$d' | nl | sort -nr | cut -f2 | nl | awk '{print $2, $1}' | sort | join temp1 - | sort -k 3n | awk '{print $1,$3}' |\
awk '{r[NR]=$0 ; k[NR]=$1 ; if(v[$1]<1+v[$2]) v[$1]=1+v[$2] } END { for (n=1; n<=NR; n++) print r[n] , v[k[n]]}'
exit 0

A 0 1
B 0 1
C 0 1
D 0 1
F P 1
G P 1
H P 1
I G 2
P A 2
P B 2
P C 2
Z D 2
Z H 2

when E as E it great... no issues...

A 0 1
B 0 1
C 0 1
D 0 1
E A 2
E B 2
E C 2
F E 3
G E 3
H E 3
I G 4
Z D 4
Z H 4

by that i saw the tsort is fine..
but when the name of the jobs are in aphabetical order it has no problem

till this both produce same output

sort $1 > temp1
tsort temp1 | sed '$d' | nl | sort -nr | cut -f2 | nl | awk '{print $2, $1}'

but after that the sort |join temp1 - scatter the feilds..

Perderabo · September 22, 2007, 5:28am

Put the script back the way I wrote it. You cannot introduce random changes to my script and expect it to continue to work. I will test my script, not yours. My script with E in the data:
$ cat datae
A 0
B 0
C 0
D 0
E A
E B
E C
Z H
Z D
F E
G E
H E
I G
$ ./level datae
A 0 1
B 0 1
C 0 1
D 0 1
E A 2
E B 2
E C 2
F E 3
G E 3
H E 3
I G 4
Z D 4
Z H 4
$

And my script with P in the data:
$ cat datap
A 0
B 0
C 0
D 0
P A
P B
P C
Z H
Z D
F P
G P
H P
I G
$ ./level datap
A 0 1
B 0 1
C 0 1
P A 2
P B 2
P C 2
D 0 1
F P 3
G P 3
H P 3
I G 4
Z D 4
Z H 4
$

The only difference is that the "D 0 1" moved below the E/P lines. That is an artifact of tsort, there are several possible ways to order the data and tsort displays one of them. If this bothers you, use a final sort to sort the output.

pbsrinivas · September 22, 2007, 7:46am

I am very sorry Perderabo... i didnot mean that way...

I thought u made a typo error because

when i run with E i get

[tofps8]/home/tofps8/SCRIPTS> cat datae
A 0
B 0
C 0
D 0
E A
E B
E C
Z H
Z D
F E
G E
H E
I G

[tofps8]/home/tofps8/SCRIPTS> ./level datae
A 3 1
B 2 1
C 1 1
D 6 1
E 4 1
E 4 1
E 4 1
F 10 1
G 8 1
H 5 1
I 9 1
Z 7 1
Z 7 1

[tofps8]/home/tofps8/SCRIPTS> cat level
#! /usr/bin/ksh
sort $1 > temp1
tsort temp1 | sed '$d' | nl | sort -nr | cut -f2 | nl | awk '{print $2, $1}' | sort | join temp1 - | sort -k 3n | awk '{print $1,$2}' |\
awk '{r[NR]=$0 ; k[NR]=$1 ; if(v[$1]<1+v[$2]) v[$1]=1+v[$2] } END { for (n=1; n<=NR; n++) print r[n] , v[k[n]]}'
exit 0

and for data with P

[tofps8]/home/tofps8/SCRIPTS> ./level datap
A 3 1
B 2 1
C 1 1
D 6 1
F 10 1
G 8 1
H 5 1
I 9 1
P 4 1
P 4 1
P 4 1
Z 7 1
Z 7 1

i just copy pasted what u gave...

I use AIX 5.3... any issue in it..

Perderabo · September 22, 2007, 1:20pm

Don't know what to tell you. I don't have AIX.

pbsrinivas · September 22, 2007, 1:51pm

I have troubled u a lot....

let me explain what is happening on my machine..

till this both give same out put

after that

changes the order..

but i dont know y it shows as all 1's in the 3rd column...

and then

that gets me the Job name and the interger ... doestnt get me the job and predecessor as expected...

thats y i thought that i might be

instead of

..

the join of temp1 and - doesnt get me the job and predeessor...

Please throw some light on it...

thanks for all the help...

if i am right

sort $1 > temp1
tsort temp1 --- does the topological sort
sed '$d' --- deletes the last line which is 0 in our case
nl --- appends line numbers to each line
sort -nr --- sort the file in reverse based on the line numbers given by previous nl command
cut -f2 --- gets the job name back ...
nl --- again a line number appending
awk '{print $2, $1}'--- gets the job name followed by line number to be printed
sort --- sort on the job name
join temp1 - --- ??????? dont know exactly what it does.... { i thought it joins the Job name to its predecessor}

sort -k 3n ---- sort on the 3rd field
awk '{print $1,$2}' -- prints the first two columns... (in this case i thought the first 2 colums would be the job name, line number )

the next login i flaw less i think the tsort is fine....

awk '{r[NR]=$0 ; k[NR]=$1 ; if(v[$1]<1+v[$2]) v[$1]=1+v[$2] } END { for (n=1; n<=NR; n++) print r[n] , v[k[n]]}'

I am counting on You people only....
Please dont desert me.....

Perderabo · September 22, 2007, 6:41pm

join does a "join" operation on two files, where "join" is the relational database join. Since I did not specify otherwise, the first field of both files is the key field and the files must be sorted on the field. Example:

$ cat one
dog puppy
cat kitty
$ cat two
dog fido
dog rover
dog spot
cat fluffy
cat tiger
cat felix
$ join one two
dog puppy fido
dog puppy rover
dog puppy spot
cat kitty fluffy
cat kitty tiger
cat kitty felix

The first output field is the key field for both files. Then comes the data fields for the first file and then come the data fields for the second file. This is how join is supposed to work. You seem to be saying that your join is giving the key field, then data field for the 2nd file, and then the data field for the first file. It's not supposed to do that. But I can't test on AIX. If it is backwards, then your awk change should compensate. But if you still get wrong output, something else must not be right. Not sure what...

pbsrinivas · September 24, 2007, 2:18am

Thanks Perderbo..

I made it to work on AIX

I have sorted on 2 nd feild rather than on 3rd and printed the 1st and 3rd columns...

Last time the sort was on 3rd feild so it did not work

Now it works great...

Thanks a lot for all the support...