awk using sub , filtering textfile

i have text file as below

 
CMF_COMP_ELEM_ GSM2_B71.WORLD_20121114130908.log   107496444 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_ GSM3_B71.WORLD_20121114130908.log   110729006 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_ GSM4_B71.WORLD_20121114130908.log   92549475 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_ GSM5_B71.WORLD_20121114130908.log   35606251 rows inserted into ALL_S1_CMF_COMP_ELEM.

 

i am expecting out put as below :

 
CMF_COMP_ELEM_ GSM2   107496444  
CMF_COMP_ELEM_ GSM3   110729006  
CMF_COMP_ELEM_ GSM4   92549475  
CMF_COMP_ELEM_ GSM5   35606251  

please share the code ? below code not working

 
awk '{sub("_B71*log","",$1); print $1 $2 }'  inputfile

It should be $2

try

awk '{sub("_B71*.*.log","",$2); print $1,$2,$3 }'  file
$ awk '{sub("_.*","",$2);print $1,$2,$3}' input.txt
CMF_COMP_ELEM_ GSM2 107496444
CMF_COMP_ELEM_ GSM3 110729006
CMF_COMP_ELEM_ GSM4 92549475
CMF_COMP_ELEM_ GSM5 35606251

@pamu, its working

 
is * does  not  take . ?
 
why we need to have *.*. ?
  • is to match the 1 or more number of the matching characters.

.* is to match any character for 1 or more time.

$ echo "GSM2_B71.WORLD_20121114130908.log" | awk 'sub("B71*","")'
GSM2_.WORLD_20121114130908.log

$ echo "GSM2_B711111111111111111111" | awk 'sub("B71*","")'
GSM2_

$ echo "GSM2_B71.WORLD_20121114130908.log" | awk 'sub("B71.*","")'
GSM2_
1 Like

Must have been a typo.
The quantifier * is for matching 0 or more occurrences of the previous character/expression.

2 Likes
sed 's!\([^ ]*\) \{1,\}\([^_]*\)\([^ ]*\) \{1,\}\([^ ]*\).*!\1 \2 \4!g'  infile
1 Like
CMF_COMP_ELEM_GSM2_B71.WORLD_20121114130908.log   107496444 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM3_B71.WORLD_20121114130908.log   110729006 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM4_B71.WORLD_20121114130908.log   92549475 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM5_B71.WORLD_20121114130908.log   35606251 rows inserted into ALL_S1_CMF_COMP_ELEM.
DDL_KKR_CSK_GSM2_B71.WORLD_20121114130908.log   107496444 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM3_B71.WORLD_20121114130908.log   110729006 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM4_B71.WORLD_20121114130908.log   92549475 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM5_B71.WORLD_20121114130908.log   35606251 rows inserted into ALL_S1_DDL_KKR_CSK.

lets say i have a text as abve, i want a line between CMF and DDL logs.... how can i do it ? .... no substitute just separation line is required

---------- Post updated at 07:43 PM ---------- Previous update was at 07:42 PM ----------

expedted output :


CMF_COMP_ELEM_GSM2_B71.WORLD_20121114130908.log   107496444 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM3_B71.WORLD_20121114130908.log   110729006 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM4_B71.WORLD_20121114130908.log   92549475 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM5_B71.WORLD_20121114130908.log   35606251 rows inserted into ALL_S1_CMF_COMP_ELEM.
========================================================================================================
DDL_KKR_CSK_GSM2_B71.WORLD_20121114130908.log   107496444 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM3_B71.WORLD_20121114130908.log   110729006 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM4_B71.WORLD_20121114130908.log   92549475 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM5_B71.WORLD_20121114130908.log   35606251 rows inserted into ALL_S1_DDL_KKR_CSK.
awk '{print $0 ~ /DDL/ && p==1 || $0 ~ /CMF/ && p==2?s"\n"$0:$0;p=0}{p=$0 ~ /CMF/?1:2}' s="====="  infile