awk using sub , filtering textfile

only4satish · November 17, 2012, 7:26am

i have text file as below

 
CMF_COMP_ELEM_ GSM2_B71.WORLD_20121114130908.log   107496444 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_ GSM3_B71.WORLD_20121114130908.log   110729006 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_ GSM4_B71.WORLD_20121114130908.log   92549475 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_ GSM5_B71.WORLD_20121114130908.log   35606251 rows inserted into ALL_S1_CMF_COMP_ELEM.

i am expecting out put as below :

 
CMF_COMP_ELEM_ GSM2   107496444  
CMF_COMP_ELEM_ GSM3   110729006  
CMF_COMP_ELEM_ GSM4   92549475  
CMF_COMP_ELEM_ GSM5   35606251

please share the code ? below code not working

 
awk '{sub("_B71*log","",$1); print $1 $2 }'  inputfile

pamu · November 17, 2012, 7:36am

It should be $2

try

awk '{sub("_B71*.*.log","",$2); print $1,$2,$3 }'  file

itkamaraj · November 17, 2012, 10:15am

$ awk '{sub("_.*","",$2);print $1,$2,$3}' input.txt
CMF_COMP_ELEM_ GSM2 107496444
CMF_COMP_ELEM_ GSM3 110729006
CMF_COMP_ELEM_ GSM4 92549475
CMF_COMP_ELEM_ GSM5 35606251

only4satish · November 17, 2012, 10:30am

@pamu, its working

 
is * does  not  take . ?
 
why we need to have *.*. ?

itkamaraj · November 17, 2012, 11:06am

is to match the 1 or more number of the matching characters.

.* is to match any character for 1 or more time.

$ echo "GSM2_B71.WORLD_20121114130908.log" | awk 'sub("B71*","")'
GSM2_.WORLD_20121114130908.log

$ echo "GSM2_B711111111111111111111" | awk 'sub("B71*","")'
GSM2_

$ echo "GSM2_B71.WORLD_20121114130908.log" | awk 'sub("B71.*","")'
GSM2_

elixir_sinari · November 17, 2012, 11:50am

Must have been a typo.
The quantifier * is for matching 0 or more occurrences of the previous character/expression.

complex.invoke · November 18, 2012, 7:35am

sed 's!\([^ ]*\) \{1,\}\([^_]*\)\([^ ]*\) \{1,\}\([^ ]*\).*!\1 \2 \4!g'  infile

only4satish · November 21, 2012, 9:13am

CMF_COMP_ELEM_GSM2_B71.WORLD_20121114130908.log   107496444 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM3_B71.WORLD_20121114130908.log   110729006 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM4_B71.WORLD_20121114130908.log   92549475 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM5_B71.WORLD_20121114130908.log   35606251 rows inserted into ALL_S1_CMF_COMP_ELEM.
DDL_KKR_CSK_GSM2_B71.WORLD_20121114130908.log   107496444 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM3_B71.WORLD_20121114130908.log   110729006 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM4_B71.WORLD_20121114130908.log   92549475 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM5_B71.WORLD_20121114130908.log   35606251 rows inserted into ALL_S1_DDL_KKR_CSK.

lets say i have a text as abve, i want a line between CMF and DDL logs.... how can i do it ? .... no substitute just separation line is required

---------- Post updated at 07:43 PM ---------- Previous update was at 07:42 PM ----------

expedted output :


CMF_COMP_ELEM_GSM2_B71.WORLD_20121114130908.log   107496444 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM3_B71.WORLD_20121114130908.log   110729006 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM4_B71.WORLD_20121114130908.log   92549475 rows inserted into ALL_S1_CMF_COMP_ELEM.
CMF_COMP_ELEM_GSM5_B71.WORLD_20121114130908.log   35606251 rows inserted into ALL_S1_CMF_COMP_ELEM.
========================================================================================================
DDL_KKR_CSK_GSM2_B71.WORLD_20121114130908.log   107496444 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM3_B71.WORLD_20121114130908.log   110729006 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM4_B71.WORLD_20121114130908.log   92549475 rows inserted into ALL_S1_DDL_KKR_CSK.
DDL_KKR_CSK_GSM5_B71.WORLD_20121114130908.log   35606251 rows inserted into ALL_S1_DDL_KKR_CSK.

complex.invoke · November 22, 2012, 2:46am

awk '{print $0 ~ /DDL/ && p==1 || $0 ~ /CMF/ && p==2?s"\n"$0:$0;p=0}{p=$0 ~ /CMF/?1:2}' s="====="  infile