Parse Directory path - awk

Hi All,

Need some help in parsing a directory listing .. output into 2 files

Input file

level1,/level2/level3/level4/ora001,10,IBB23 
level1,/level2/level3/level4/ora001/blu1,,IBB23 
level1,/level2/level3/level4/ora001/clu1,,IBB23 
level1,/level2/level3/level4/ora002,,IBB24
level1,/level2/level3/level4/ora002/bbu1,20,IBB24
level1,/level2/level3/level4/ora002/ccu1,20,IBB24
level1,/level2/level3/level4/ora003,4,IBB25
level1,/level2/level3/level4/ora003/ddu1,,
level1,/level2/level3/level4/ora004/ttu1,4,IBB25
level1,/level2/level3/level4/ora004/rru1,5,IBB25
level1,/level2/level3/level4/ora005/yyu1,5,IBB25

Output

File1

level1,/level2/level3/level4/ora001,10,IBB23 
level1,/level2/level3/level4/ora002/bbu1,20,IBB24
level1,/level2/level3/level4/ora002/ccu1,20,IBB24
level1,/level2/level3/level4/ora004/ttu1,4,IBB25
level1,/level2/level3/level4/ora004/rru1,5,IBB25
level1,/level2/level3/level4/ora005/yyu1,5,IBB25

File2

level1,/level2/level3/level4/ora003,4,IBB25
level1,/level2/level3/leve4/ora003/ddu1,,

Logic 1
rows where $2 has the same path upto 4 dir levels and $4 is same is one sub group . , for example

level1,/level1/level2/level3/ora001,10,IBB23 
level1,/level1/level2/level3/ora001/blu1,,IBB23 
level1,/level1/level2/level3/ora001/clu1,,IBB23

When the above is true , copy the line(s) where $3 field is not empty to file 1 , rest of the sub group can be discarded
and if $3 is empty for all the lines in sub group . , copy all the lines in subgroup to file2

Logic2

for rows where $2 has the same path upto 4 dir levels , but $4 is not same or empty ,copy the line(s) to file 2
Any line(s) with unique $2 for 4 dir levels , but with $3 and $4 not empty will go to file 1 as well

Rest of the lines which do not match to above 2 criteria will go to file2 as well

Thanks

Any attempt / idea / thought from your side?

I can't see your logics example comply to your specification - none of those $2 values represent the "same path upto 5 dir levels". The first entry has four levels only, the next two have five but are different. Please revise your spec / example.

Thx ...

my bad ... i edited the question ..

What about your own approach to solve the problem?

Hi ,

I have only tried basic shell looping .. if then else stuff ... was looking for something more efficient as there are more than 20,000 lines

Thanks

Why does

level1,/level2/level3/level4/ora003,4,IBB25

show up in file2?

Hi ..

level1,/level2/level3/level4/ora003,4,IBB25
level1,/level2/level3/level4/ora003/ddu1,,

In this case ... $4 is not same ... so both lines get copied to file2

Will the data follow a certain sequence, i.e. parent dirs sorted, parent dir before child dir, non-empty $3 before empty $3, non-empty $4 before empty $4?

Hi ...

It most likely won't ..
Example it could be something like this .. input i mean

level1,/level2/level3/level4/ora001,10,IBB23
level1,/level2/level3/level4/ora002,,IBB24
level1,/level2/level3/level4/ora001/blu1,,IBB23
level1,/level2/level3/level4/ora001/clu1,,IBB23
level1,/level2/level3/level4/ora002/bbu1,20,IBB24
level1,/level2/level3/level4/ora002/ccu1,20,

Is it possible to e.g.

sort -t, -k2,2 -k4r -k3 file
1 Like

Original File

slsins,/data/application-nr/slsins/MYRIAD/MYRIAD_RISK_03,10,IEE1
slsins,/data/application-nr/slsins/MYRIAD/MYRIAD_RISK_02,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN18300/QLN18300,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63500/QLN63500,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_PLNLOD2031/PLNLOD2031,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN16800/QLN16800,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001,10,
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63200/QLN63200,10,
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63100/QLN63100,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17800/QLN17800,10,IEE1
slsins,/data/dbdumps-rp/slsins/ggg001,10,IEE1
slsins,/data/dbdumps-rp/slsins/ggg002,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17700/QLN17700,10,IEE1
slsins,/data/application-nr/slsins/MYRIAD/MYRAID_RISK_01,,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_PLNLOD2071/PLNLOD2071,10,IEE1
slsins,/data/dbdumps-nr/gshins01/syb001,10,IEE1
slsins,/data/dbdumps-nr/gshins01/ggg001,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63600/QLN63600,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_DLNLOD2125,,IEE1
slsins,/data/applications-nr/gshins01/pCloud,10,IEE1
slsins,/data/application-rp/slsins/Odyssey/Odyssey_26656_03,10,IEE1
slsins,/data/application-nr/slsins/CDDS,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17100/QLN17100,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_DLN45100,10,IEE1
slsins,/data/dbdumps-rp/gshins01/sql001,10,IEE1
slsins,/data/application-rp/slsins/Odyssey/Odyssey_26656_03/IB_Odyssey_03,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17900/QLN17900,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN64000/QLN64000,10,IEE1

$ sort -t, -k2,2 -k4r -k3 ei

slsins,/data/application-nr/slsins/CDDS,10,IEE1
slsins,/data/application-nr/slsins/MYRIAD/MYRAID_RISK_01,,IEE1
slsins,/data/application-nr/slsins/MYRIAD/MYRIAD_RISK_02,10,IEE1
slsins,/data/application-nr/slsins/MYRIAD/MYRIAD_RISK_03,10,IEE1
slsins,/data/application-rp/slsins/Odyssey/Odyssey_26656_03,10,IEE1
slsins,/data/application-rp/slsins/Odyssey/Odyssey_26656_03/IB_Odyssey_03,10,IEE1
slsins,/data/applications-nr/gshins01/pCloud,10,IEE1
slsins,/data/dbdumps-nr/gshins01/ggg001,10,IEE1
slsins,/data/dbdumps-nr/gshins01/syb001,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001,10,
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_DLN45100,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_DLNLOD2125,,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_PLNLOD2031/PLNLOD2031,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_PLNLOD2071/PLNLOD2071,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN16800/QLN16800,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17100/QLN17100,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17700/QLN17700,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17800/QLN17800,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17900/QLN17900,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN18300/QLN18300,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63100/QLN63100,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63200/QLN63200,10,
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63500/QLN63500,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63600/QLN63600,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN64000/QLN64000,10,IEE1
slsins,/data/dbdumps-rp/gshins01/sql001,10,IEE1
slsins,/data/dbdumps-rp/slsins/ggg001,10,IEE1
slsins,/data/dbdumps-rp/slsins/ggg002,10,IEE1

$ more file1

slsins,/data/application-nr/slsins/CDDS,10,IEE1
slsins,/data/application-nr/slsins/MYRIAD/MYRIAD_RISK_02,10,IEE1
slsins,/data/application-nr/slsins/MYRIAD/MYRIAD_RISK_03,10,IEE1
slsins,/data/application-rp/slsins/Odyssey/Odyssey_26656_03,10,IEE1
slsins,/data/application-rp/slsins/Odyssey/Odyssey_26656_03/IB_Odyssey_03,10,IEE1
slsins,/data/applications-nr/gshins01/pCloud,10,IEE1
slsins,/data/dbdumps-nr/gshins01/ggg001,10,IEE1
slsins,/data/dbdumps-nr/gshins01/syb001,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_DLN45100,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_PLNLOD2031/PLNLOD2031,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_PLNLOD2071/PLNLOD2071,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN16800/QLN16800,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17100/QLN17100,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17700/QLN17700,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17800/QLN17800,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN17900/QLN17900,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN18300/QLN18300,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63100/QLN63100,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63500/QLN63500,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63600/QLN63600,10,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN64000/QLN64000,10,IEE1
slsins,/data/dbdumps-rp/gshins01/sql001,10,IEE1
slsins,/data/dbdumps-rp/slsins/ggg001,10,IEE1
slsins,/data/dbdumps-rp/slsins/ggg002,10,IEE1

$ more file2

slsins,/data/application-nr/slsins/MYRIAD/MYRAID_RISK_01,,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001,10,
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_DLNLOD2125,,IEE1
slsins,/data/dbdumps-nr/slsins/ggg001/DMP_QLN63200/QLN63200,10,
Moderator comments were removed during original forum migration.