Hi,
I am trying to extract some patterns from a line. The input file is space delimited and i could not use column to get value after "IN" or "OUT" patterns as there could be multiple white spaces before the next digits that i need to print in the output file . I need to print 3 patterns in a line as i bold them below:
inputfile
>RP: 123 DSU17281T6 DSU17281 Dressrossa crassa PT7T0 hypo prot (124 aa) OUT 0
>RP: 286 DSU17282T0 DSU17282 Dressrossa crassa PT7T0 hypo prot (287 aa) OUT 5 51 70 111 130 170 189 204 223 234 253
>RP: 110 DSU17283T0 DSU17283 Dressrossa crassa PT7T0 hypo prot (111 aa) OUT 0
>RP: 230 DSU17284T2 DSU17284 Dressrossa crassa PT7T0 hypo prot (231 aa) IN 1 18 35
>RP: 54 DSU16024T3 DSU16024 Dressrossa crassa PT7T0 mo ATP unit 8 (55 aa) OUT 1 13 32
>RP: 261 DSU16025T2 DSU16025 Dressrossa crassa PT7T0 mo ATP unit 6 (262 aa) OUT 7 41 60 96 118 127 146 153 172 183 206 213 231 236 254
>RP: 480 DSU16026T0 DSU16026 Dressrossa crassa PT7T0 mo (481 aa) IN 3 41 58 96 113 120 137
>RP: 74 DSU16027T1 DSU16027 Dressrossa crassa PT7T0 mo ATP unit 9 (75 aa) IN 2 11 35 48 72
>RP: 250 DSU16028T0 DSU16028 Dressrossa crassa PT7T0 mo cytochrome c oxidase subunit 2 (251 aa) OUT 2 40 59 78 97
Expected Output (in tab delimited)
DSU17281T6 OUT 0
DSU17282T0 OUT 5
DSU17283T0 OUT 0
DSU17284T2 IN 1
DSU16024T3 OUT 1
DSU16025T2 OUT 7
DSU16026T0 IN 3
DSU16027T1 IN 2
DSU16028T0 OUT 2
I have been trying many things but it did not give what i want. my best that i could do as below:
grep -wE "DSU.*T[0-9]|IN[[:space:]]*[0-9]|OUT[[:space:]]*[0-9]"
IT shows that the patterns that i wanted are matched good but still it prints the whole line. Then i tried changing "grep -wE" to "grep -oE" and the output that i got are not on the same line as below. I need them to be on the same line as i showed in my expected output above:
DSU17281T6
OUT 0
DSU17282T0
OUT 5
DSU17283T0
OUT 0
DSU17284T2
IN 1
DSU16024T3
OUT 1
DSU16025T2
OUT 7
DSU16026T0
IN 3
DSU16027T1
IN 2
DSU16028T0
OUT 2
I tried sed and awk, but i always get the whole lines being printed. Can anyone here show me where do i need to change here? also, may i know how to do it in sed and awk? Thanks.