Processing Dependency Printing

kapeeljoshi · October 26, 2012, 3:56am

Gurus,

I came across a typical requirement where the input is like-

TRANS FIRM
DEPT CUST
TRANS CUST
TRANS DEPT
FIRM CUST

& the expected Output is-

CUST
DEPT
FIRM
TRANS

This is basically a dependency list for processing the tables where
FIRM is feeding data to TRANS table
CUST is feeding data to DEPT table
CUST is feeding data to TRANS table
DEPT is feeding data to TRANS table
& CUST is feeding data to FIRM table.

Due to the feeding mechanism, the FIRM table should be available even before the TARNS is being processed, but since TRANS is depended upon CUST, CUST should be processed even before & likewise.

Finally the typical sequence of execution should be like-

CUST
DEPT
FIRM
TRANS

Appreciate your help in this regard :rolleyes:

DGPickett · October 26, 2012, 3:21pm

What do you need?

kapeeljoshi · November 1, 2012, 4:33am

Basically I have a file say tmp.txt where the text is-

TRANS FIRM
DEPT CUST
TRANS CUST
TRANS DEPT
FIRM CUST

& based on the dependency listed in the file, with any command, the output should be

CUST
DEPT
FIRM
TRANS

I am trying with awk/ loops however it doesnt seem to work..

DGPickett · November 1, 2012, 10:13am

So, you are normalizing a graph into an ordered list. The list of upstream input nodes is the right column made unique. The list of downstream output nodes is the right column. Sometimes it is easier to work backward from the outputs, in case some inputs are not needed. However, the list of all nodes is the uniqu sum of the irght and left column lists, the starting points have no dependency and the end points have no node dependent on them. Make the list of independent input nodes, printing them and moving them from the original total list to the done list, then print and move the nodes that depend on them, recursive until there is nobody dependent on the listed nodes (first list is empty).

To be robust, you need to deal with not needed and multiple dependencies forward and backward, which is why it is cleaner to start with the output and go back, building the output list from the end by adding discovered dependent nodes before the output (no dependents) nodes, being careful not to list any node twice. It is assumed that external users want all the nodes with no internal dependents. Think of the nodes as in layers on the graph: final, final-1, final-2, etc. Of course, CUST might seem final-1 but because of DEPT it needs to be promoted to final-2, which, since it is the same final, says that dependency TRANS CUST is actually inplicit and not needed. For multiple final, it could be initially TRANS-1 but promoted to TRANS-2. If there was anopther final product XXXX where CUST was XXXX-1, it could also be TRANS-2, and for simplicity it could be considered final-2, as if all final-2 are done before final-1, the final-1 need is met. Starting from output, like make, means things not needed are ignored.

If you start adding processing times to the mix, you can derive the end to end time for parallel processing and the critical path.