Grep/awk part of info of a column

owwow14 · December 6, 2013, 6:33am

I have a question that I am at a loss to solve. I have 3 column tab-separated data, such as:

abs nmod+n+n-commitment-n   349.200023 
abs nmod+n+n-a-commitment-n 333.306429 
abs into+ns-j+vn-pass-rb-divide-v   295.57316 
abs nmod+n+ns-commitment-n  182.085018 
abs nmod+n+n-pledge-n   149.927391
abs nmod+n+ns-reagent-n 142.347358

I need to isolate the last two "elements" of the third column, in which my desired result would be a 4-column output that only contains those elements that end with "-n".
such as:

abs nmod+n+n   commitment-n   349.200023
abs nmod+n+n-a   commitment-n 333.306429 
abs nmod+n+ns   commitment-n  182.085018 
abs nmod+n+n   pledge-n   149.927391
 abs nmod+n+ns   reagent-n 142.347358

.

In this case, is there an awk, grep anything that can help? The files are approx. 500 MB, so they are not huge, but not small either. Thanks for any insight.

RudiC · December 6, 2013, 6:55am

Try

awk '$2~/-n$/ {sub (/-/," ", $2); print}' file
abs nmod+n+n commitment-n 349.200023
abs nmod+n+n a-commitment-n 333.306429
abs nmod+n+ns commitment-n 182.085018
abs nmod+n+n pledge-n 149.927391
abs nmod+n+ns reagent-n 142.347358

EDIT: I see an error in line 2. Let me think...

---------- Post updated at 12:55 ---------- Previous update was at 12:38 ----------

This may be more adequate:

 awk '$2~/-n$/ {sub (/-[^-]*-n$/," &", $2); $0=$0; sub (/^-/,"",$3); print}' file
abs nmod+n+n commitment-n 349.200023
abs nmod+n+n-a commitment-n 333.306429
abs nmod+n+ns commitment-n 182.085018
abs nmod+n+n pledge-n 149.927391
abs nmod+n+ns reagent-n 142.347358

Akshay_Hegde · December 6, 2013, 8:16am

Try :

$ awk '$2~/-n$/{j=0;for(i=length($2);i>=1;i--){if(substr($2,i,1)~/\-/){++j}if(j>1)break};$2 = substr($2,1,i-1) FS substr($2,i+1);print}' file

abs nmod+n+n commitment-n 349.200023
abs nmod+n+n-a commitment-n 333.306429
abs nmod+n+ns commitment-n 182.085018
abs nmod+n+n pledge-n 149.927391
abs nmod+n+ns reagent-n 142.347358