How to print string after colon?

scriptor · November 14, 2017, 8:14am

Hi

How do I print character after colon":"
below is input file

BUC1  : 33157147, BUC1 COUNT : 478455,BUC1 : 9930334.18
BUC2  : 1203100,BUC2 COUNT : 318678,BUC2 GIVEN : 3493491.59
BUC3  : 234567.99

expected output

 
33157147 478455 9930334.18
12031002 318678 3493491.59
234567.99

jim_mcnamara · November 14, 2017, 8:49am

try this:

tr -s ',' ' ' < filename | 
awk '{ for (i=1;i<=NF;i++)
       { if($(i)==":"){printf("%s ",$(i+1) ) }
       }
       printf("\n")
     }'

You could also use -F '[ ,]' as the field separator instead of using tr but not all awk implementations support regexes for specifying FS.

vgersh99 · November 14, 2017, 8:50am

something along these lines:

awk -F'[:,]' '{for(i=1;i<=NF;i=i+2) printf("%s%s", $(i+1), (i+2>NF)?ORS:OFS)}' myFile

Scrutinizer · November 14, 2017, 9:30am

Another approach:

awk -F, '{gsub(/[^,]*: /,x); $1=$1}1' file

--
or

sed 's/[^,]*: //g; s/,/ /g' file

scriptor · November 15, 2017, 12:58am

Hi Scrutinizer

below command works for me but can you please explain me . I am not able to understand it. like what is the use of

[^,]*

//g;

s/,/ /g'

 
 sed 's/[^,]*: //g; s/,/ /g' file

apmcd47 · November 15, 2017, 4:48am

In sed, and other tools such as grep and awk, [] is a character class - a character that can match anything inside the brackets. This can be a list, a range, or a combination of the two. If the first character after the left bracket is a carat (^) you invert the class, so the above matches anything except the comma. The asterisk is a special character that tells sed to match the previous character zero or more times. So [^,]* could match anything up to and not including a comma, or anything after a comma.

delete

Change comma to space

Delete all strings in the line that don't include the comma and end in ": ";
Change all commas to spaces.

Andrew

scriptor · November 16, 2017, 12:29am

Hi Andrew,

thx a lot, you explained it very well.
however I am still not able to understand explanation for this

 //g;

.

i an not able to understand the working as per below explanation

Delete all strings in the line that don't include the comma and end in ": ";

it will be really helpful to me if you explain it again for me

thx in advanced
Scriptor

Don_Cragun · November 16, 2017, 2:32am

You're looking at too small a part of the command. A sed substitute command is usually seen in the form:

s/BasicRegularExpression/ReplacementString/flags

There are two sed substitute commands in this case separated by the semicolon ( ; ).

In the first sed substitute command, the Basic Regular Expression is [^,]*: which matches a string of adjacent characters that are not a comma ( [^,] ) appearing zero or more times ( * ) followed by a colon ( : ) followed by a single space character. The Replacement String in this case is the empty string (which effectively removes the matched string). And the flags in this case is g which requests that the substitution be applied globally to all non-overlapping strings that match the BRE.

In the second sed substitute command, the BRE is , which matches a comma character, the replacement string is a space character, and, again, the g flag requests that each comma found on the line be replaced by a space.

MadeInGermany · November 16, 2017, 1:26pm

I wonder if one can include an optional comma ,\{0,1\} in the main substitution

sed 's/,\{0,1\}[^,]*: //g' file

Don_Cragun · November 16, 2017, 2:53pm

One can do that, but if you do the output from the input line:

BUC2  : 1203100,BUC2 COUNT : 318678,BUC2 GIVEN : 3493491.59

becomes:

12031003186783493491.59

instead of:

1203100 318678 3493491.59

One could also get rid of the space after the colon in the BRE:

sed 's/,\{0,1\}[^,]*://g' file

and that would make the output from the above input line be:

 1203100 318678 3493491.59

which is almost what is wanted, but has an extraneous leading space on every output line.

One could also try:

sed 's/[^,]*: \([0-9.]*\),\{0,1\}/\1 /g' file

and that looks like the desired output to the naked eye, but has an extraneous trailing space on every output line.

The following seems to do what is wanted with a single sed substitution command:

sed 's/[^,]*: \([0-9.]*\),\{0,1\}[^ ]*\( \{0,1\}\)/\1\2/g' file

but it is not something that I would suggest for someone who is just learning how to use sed and learning how to write sed BREs (unless I was trying to show an example of the sed -specific extensions to standard BREs and the use of back references in replacement strings).

Note that all of the examples above work with standards-conforming versions of sed , but might need an added option on the command line to make it work with GNU sed . For example, the last example above, when using GNU sed , probably needs to be invoked with:

sed --posix 's/[^,]*: \([0-9.]*\),\{0,1\}[^ ]*\( \{0,1\}\)/\1\2/g' file

Aia · November 19, 2017, 1:14pm

perl -nle '@n = /:\s?([\d.]*)/g and print "@n"' scriptor.file

Output:

33157147 478455 9930334.18
1203100 318678 3493491.59
234567.99

Explanation:

@n = /:\s?([\d.]*)/g # extract decimal and non decimal numbers
and print "@n"'  # display if it got extraction