I have a question that I am at a loss to solve. I have 3 column tab-separated data, such as:
abs nmod+n+n-commitment-n 349.200023
abs nmod+n+n-a-commitment-n 333.306429
abs into+ns-j+vn-pass-rb-divide-v 295.57316
abs nmod+n+ns-commitment-n 182.085018
abs nmod+n+n-pledge-n 149.927391
abs nmod+n+ns-reagent-n 142.347358
I need to isolate the last two "elements" of the third column, in which my desired result would be a 4-column output that only contains those elements that end with "-n".
such as:
abs nmod+n+n commitment-n 349.200023
abs nmod+n+n-a commitment-n 333.306429
abs nmod+n+ns commitment-n 182.085018
abs nmod+n+n pledge-n 149.927391
abs nmod+n+ns reagent-n 142.347358
.
In this case, is there an awk, grep anything that can help? The files are approx. 500 MB, so they are not huge, but not small either. Thanks for any insight.
RudiC
December 6, 2013, 6:55am
2
Try
awk '$2~/-n$/ {sub (/-/," ", $2); print}' file
abs nmod+n+n commitment-n 349.200023
abs nmod+n+n a-commitment-n 333.306429
abs nmod+n+ns commitment-n 182.085018
abs nmod+n+n pledge-n 149.927391
abs nmod+n+ns reagent-n 142.347358
EDIT: I see an error in line 2. Let me think...
---------- Post updated at 12:55 ---------- Previous update was at 12:38 ----------
This may be more adequate:
awk '$2~/-n$/ {sub (/-[^-]*-n$/," &", $2); $0=$0; sub (/^-/,"",$3); print}' file
abs nmod+n+n commitment-n 349.200023
abs nmod+n+n-a commitment-n 333.306429
abs nmod+n+ns commitment-n 182.085018
abs nmod+n+n pledge-n 149.927391
abs nmod+n+ns reagent-n 142.347358
1 Like
Try :
$ awk '$2~/-n$/{j=0;for(i=length($2);i>=1;i--){if(substr($2,i,1)~/\-/){++j}if(j>1)break};$2 = substr($2,1,i-1) FS substr($2,i+1);print}' file
abs nmod+n+n commitment-n 349.200023
abs nmod+n+n-a commitment-n 333.306429
abs nmod+n+ns commitment-n 182.085018
abs nmod+n+n pledge-n 149.927391
abs nmod+n+ns reagent-n 142.347358