Hi,
I am trying to extract data based on certain conditions. My sample input file as below:-
lnc-2:1 OnePiece tra_law 500 688 1 . . g_id "R792.8417"# tra_law_id "R792.8417.1"# g_line "2.711647"# KM "8.723820"#
lnc-2:1 OnePiece room 500 510 1 . . g_id "R792.8417"# tra_law_id "R792.8417.1"# room_number "1"# g_line "2.711647"#
lnc-2:1 OnePiece room 540 588 1 . . g_id "R792.8417"# tra_law_id "R792.8417.1"# room_number "2"# g_line "2.711647"#
lnc-2:1 OnePiece room 620 650 1 . . g_id "R792.8417"# tra_law_id "R792.8417.1"# room_number "3"# g_line "2.711647"#
lnc-2:1 OnePiece room 660 688 1 . . g_id "R792.8417"# tra_law_id "R792.8417.1"# room_number "4"# g_line "2.711647"#
lnc-1:3 OnePiece tra_law 1 3601 1 . . g_id "R792.8416"# tra_law_id "R792.8416.1"# g_line "36.370155"# KM "117.008842"#
lnc-1:3 OnePiece room 1 601 1 . . g_id "R792.8416"# tra_law_id "R792.8416.1"# room_number "1"# g_line "36.370155"#
lnc-1:3 OnePiece room 1020 3001 1 . . g_id "R792.8416"# tra_law_id "R792.8416.1"# room_number "2"# g_line "36.370155"#
lnc-1:3 OnePiece room 3400 3601 1 . . g_id "R792.8416"# tra_law_id "R792.8416.1"# room_number "3"# g_line "36.370155"#
lnc-9:1 OnePiece tra_law 1743 2314 1 . . g_id "R792.8419"# tra_law_id "R792.8419.1"# g_line "27.213287"# KM "87.549683"#
lnc-9:1 OnePiece room 1743 2314 1 . . g_id "R792.8419"# tra_law_id "R792.8419.1"# room_number "1"# g_line "27.213287"#
lnc-16:4 OnePiece tra_law 25408 63025 1 - . g_id "R792.8420"# tra_law_id "R792.8420.1"# g_line "357.721802"# KM "1150.850586"#
lnc-16:4 OnePiece room 25408 25528 1 - . g_id "R792.8420"# tra_law_id "R792.8420.1"# room_number "1"# g_line "765.276733"#
lnc-16:4 OnePiece room 62888 63025 1 - . g_id "R792.8420"# tra_law_id "R792.8420.1"# room_number "2"# g_line "0.372920"#
I want to get an output where when all conditions are met, it should print every lines with the same name in $1. The conditions as follows:-
1) "tra_law" is found in $3 && the results of $5 - $4 (of tra_law) is > 200. It should print all the following lines associated with it.
2) Then, it should check for the room number in last column, where only room_number with min of 3 counts will be taken into consideration.
The output should be like below:-
lnc-1:3 OnePiece tra_law 1 3601 1 . . g_id "R792.8416"# tra_law_id "R792.8416.1"# g_line "36.370155"# KM "117.008842"#
lnc-1:3 OnePiece room 1 601 1 . . g_id "R792.8416"# tra_law_id "R792.8416.1"# room_number "1"# g_line "36.370155"#
lnc-1:3 OnePiece room 1020 3001 1 . . g_id "R792.8416"# tra_law_id "R792.8416.1"# room_number "2"# g_line "36.370155"#
lnc-1:3 OnePiece room 3400 3601 1 . . g_id "R792.8416"# tra_law_id "R792.8416.1"# room_number "3"# g_line "36.370155"#
as you can see, only lnc-1:3 met the conditions. for lnc-2:1, the tra_law value is less than 200 (688 - 500 = 180), therefore, it is omitted. As for lnc-9:1 and lnc-16:4, though the tra_law value > 200, both are omitted too as the room_number counts are less than 3.
I tried to use awk to work on it. My codes as below:-
awk -F"\t" 'NR>2 {$20=$5-$4; if ($20>200 && $3 ~/tra_law/) print $0}' inputfile | awk '{NF--NF};1' > outputfile
I got the results of the conditions no 1. But, it did not print the following lines associated to it. Also, I do not know how to check for condition no 2. I would prefer for the condition no 2 to be put in separate awk command as I might need to use them separately in different situation. Tried it many times but failed. Appreciate your kind help. Thanks.