how to find the field has more than 2 decimals

ken6503 · November 16, 2016, 9:55pm

Hi Gurus,

I have below sample file, I need find the line which 2rd field has more than 2 decimals.

in sample file, I need to find xyz, 123456.789

 
 abc, 1234.45, def
xyz, 123456.789, xxx
bce, 1234.34, xxx

thanks in advance

Don_Cragun · November 16, 2016, 10:04pm

With well over 100 posts we would have that you have learned something from the suggestions we have provided in helping you with your previous problems. What have you tried to solve this problem on your own?

What shell are you using?

What operating system are you using?

ken6503 · November 16, 2016, 10:08pm

I have tried to search the solution from internet, but no luck at all.

my shell is ksh

my os is solaris.

thanks.

Don_Cragun · November 16, 2016, 10:12pm

Instead of searching the internet, why don't you look back through all of the solutions you have been given to solve your previous problems and use what you have learned from those solutions to try come up with a very simple nawk script that will do what you have asked for here?

ken6503 · November 16, 2016, 10:20pm

I tried below command but it gave me the record with one decimal as well.

awk -F"," '$2!~/\.[0-9][0-9]$/ {print $0}' test

Aia · November 16, 2016, 10:28pm

awk -F"," '$2 ~ /\.[0-9][0-9][0-9]/' test

itkamaraj · November 16, 2016, 10:33pm

try this...

awk -F, '{split($2,Arr,".")}length(Arr[2])>2'  test

Don_Cragun · November 16, 2016, 10:40pm

That is a good start. The ERE you are using is matching a decimal point followed by two decimal digits (which as you have found only matches lines with exactly two decimal places) at the end of the 2nd field. By negating the match, it finds lines with more than two digits following a decimal point, less than two digits following a decimal point, and fields that do not contain a decimal point. If we use a positive search for three digits following a decimal point (i.e., $2~/[.][0-9][0-9][0-9]/ ) not anchored to the end of the field, we can find any field that contains three or more decimal digits after a decimal point.

In the 1st post in this thread you said you wanted to print the 1st two fields from lines that matched your criteria. Using print $0 prints the entire input line; not just the 1st two fields.

Try:

#!/bin/ksh
/usr/xpg4/bin/awk '
BEGIN {	FS = OFS = ","
}
$2 ~ /[.][0-9][0-9][0-9]/ {
	print $1, $2
}' test

and see if that does what you want.

Note that itkamaraj's solution won't necessarily work with input that has extraneous spaces such as:

xyz, 123456.78 , xxx

which has three characters after the decimal point, but not three digits after the decimal point.

RavinderSingh13 · November 16, 2016, 10:42pm

Hello ken6503,

Could you please try following and let me know if this helps.

awk -F", " '($2 ~ /\.[0-9]{3}+/){print $1,$2}'   Input_file

Output will be as follows.

xyz, 123456.789

Also I have tested it in GNU awk , I hope it should work in all versions too(not tested on others though). On a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

Thanks,
R. Singh

ken6503 · November 16, 2016, 10:43pm

thanks for the suggestion, I'll review the old post to find the solution first in the future.

rovf · November 17, 2016, 2:12am

Since you always have a period in the second field, a simple

    grep -E  '[^,]+,[ 0-9]+[.][0-9]{3}'

on your file should do.

looney · November 17, 2016, 3:08am

grep

grep -o '.*\.[0-9]\{3\}' file

rovf · November 17, 2016, 10:13am

A .* at the start of a pattern is redundant.

Also, the pattern incorrectly reports occurances of the pattern in the first or third field. The OP wanted to check explicitly the second field only. While the example suggests that these fields might contain only non-digit contain, we don't know for sure that this is the case.

Don_Cragun · November 17, 2016, 2:13pm

Hi rovf,
Good catch. Note, however, that your suggestion in post #11:

grep -E  '[^,]+,[ 0-9]+[.][0-9]{3}'

can also match a decimal point and three digits in the 3rd or subsequent fields. To avoid that possibility, you need to anchor your grep search pattern to the start of a line:

grep -E  '^[^,]+,[ 0-9]+[.][0-9]{3}'

And, if you go back to post #1 in this thread, you will notice that it appears that the OP only wants to print the first two fields of matched lines; not the entire line. It seems that that is why looney used the -o option and the leading .* in the search pattern. To restrict that match to the 2nd field and to print the entire contents of the 1st two fields if there are more digits or non-digit characters after a decimal point and 3 digits in the 2nd field, you might want something more like:

grep -Eo  '^[^,]*,[ 0-9]+[.][0-9]{3}[^,]*'

if the version of grep on your system supports the -o option (which is not required by the standards). (Note that I used [^,]* in both places in this suggestion rather than [^,]+ because there is no stated requirement that the 1st field be non-empty.) If your system's grep does not include a -o option, you'll have to use something like awk , perl , or sed instead of grep to print a partial line match.