Remove '.' from file for numbers ending in '.'

Hi,
I have numerous files which have data in the following format

A|B|123.|Mr.|45.66|33|zz
L|16.|33.45|AC.|45.

I want to remove decimal point only if it is last character in a number.
O/p should be

A|B|123|Mr.|45.66|33|zz
L|16|33.45|AC.|45

I tried this

sed -e 's/.|/|/g'

Problem with above is that it removes the '.' for Mr.
Also I want to remove '.' for last field in second line 45. should be 45
Basically any numeric filed which has a decimal but no number after decimal, should have decimal point removed

Thanks

sed -e 's/\([0-9]\).|/\1|/g' -e 's/\([0-9]\).$/\1/'
1 Like

Works almost. Escaping the dot improves it:

sed -e 's/\([0-9]\)\.|/\1|/g' -e 's/\([0-9]\)\.$/\1/' file
A|B|123|Mr.|45.66|33|zz
L|16|33.45|AC.|45

---------- Post updated at 18:47 ---------- Previous update was at 18:43 ----------

And, if your sed allows for EREs, this might work as well:

sed -r 's/([0-9])\.(\||$)/\1\2/g' file

Note that the suggestions test if the preceding character is a digit, not if a field that consists of a number ends with a dot. For example if one of the fields would be A1. then this approach would fail.

An alternative would be to split it into fields and test each field if it is numeric and if it ends in a dot. For example:

awk '{for(i=1; i<=NF; i++) if($i==$i+0) sub(/\.$/,x,$i)}1' FS=\| OFS=\| file

True. Here is a sed -script covering this:

sed 'start:;s/\(|*[0-9][0-9]*\)\.|/\1|/;t start' /path/to/file

Note that i need a loop (instead of the "g" option) because i have to match the leading AND the trailing field separator.

I hope this helps.

bakunin

Works almost. The colon needs to go in front of the start label, and, due to the |* , it will remove A1.'s dot as well. Removing the star from the pipe, it will not catch the line start. Right now, I can't see a solution...

---------- Post updated at 10:42 ---------- Previous update was at 10:39 ----------

... unless EREs are possible:

sed -r 's/(^|\|)([0-9]+)\.(\||$)/\1\2\3/g' file

Even with ERE, you need to run it twice, perhaps conditionally if the first hits, as you used both start and end field '|'. Otherwise, you miss the following adjacent fields on a line like: "123.|456."

You can add pipes to both ends for the substitute and then remove them:

sed '
  s/.*/|&|/
  s/\(|[0-9]\{1,99\}\)\.|/\1|/g
  t again
  b end
  :again
  s/\(|[0-9]\{1,99\}\)\.|/\1|/g
  :end
  s/^|//
  s/|$//
 ' in_file

I could have removed both added pipes with one substitute "s/^|\(.*\)|$/\1/" but these back references are a bit slower, in my experience, so I avoid them where possible.

Two passes can be avoided if a less careful pattern is sought, like "s/\([0-9]\).|/\1/", as it encompasses only one pipe. It mangles any non-numeric field with a trailing numer and dot, like "123|The field count of this line is 3.|xyz".

2 Likes