Substituting comma "," for dot "." in a specific column when comma"," is a delimiter

poliver · June 5, 2012, 11:04am

Hi,

I'm dealing with an issue and losing a lot of hours figuring out how i would solve this.
I have an input file which looks like this:

('BLABLA +200-GRS','Servi�o ','Tarifa��oServi�o','wap.bla.us.0000000121',2985,0,55,' de convers�o em escada','Dia','Domingos')
('BLABLA +200-GRR','Servi�o ','Tarifa��oServi�o','wap.bla.us.0000000121',2985,0,55,' de convers�o em escada','Reduzido','S�bados')
('BLABLA 45-GPR','Servi�o ','Tarifa��oServi�o','canais.bla.us.0000000222',3073,0,' de convers�o de diferen�a','Noite','S�bados')
('BLABLA 45-GPR','Servi�o ','Tarifa��oServi�o','canais.bla.us.0000000222',3073,0,' de convers�o de diferen�a','Noite','Segunda - Sexta')
('BLABLA 35-GMS','Servi�o','Tarifa��oServi�o','VC3 XXXX YYY-Fixo TIM(on)',3121,1,21,' de convers�o em escada','Noite','Segunda - Sexta')
('BLABLA-GRN','Servi�o','Tarifa��oServi�o','Chamadas',1362,1,961,' de convers�o em escada','Noite','Segunda - Sexta')
('BLABLA-GPE','Servi�o','Tarifa��oServi�o','Chamadas',1362,1,9897,' de convers�o em escada','Dia Todo','Domingos')

So, sometimes i have 9 columns (if using comma delimiter) (lines 3 and 4) and sometimes i have 10 columns (lines 1,2,5,6,7)

I'd like to replace only this column, which is a number (decimal or not), for a dot... it will look like this:

0.55
0.55
0
0
1.21
1.961
1.9897

So, my final result should be like this:

('BLABLA +200-GRS','Servi�o ','Tarifa��oServi�o','wap.bla.us.0000000121',2985,0.55,' de convers�o em escada','Dia','Domingos')
('BLABLA +200-GRR','Servi�o ','Tarifa��oServi�o','wap.bla.us.0000000121',2985,0.55,' de convers�o em escada','Reduzido','S�bados')
('BLABLA 45-GPR','Servi�o ','Tarifa��oServi�o','canais.bla.us.0000000222',3073,0,' de convers�o de diferen�a','Noite','S�bados')
('BLABLA 45-GPR','Servi�o ','Tarifa��oServi�o','canais.bla.us.0000000222',3073,0,' de convers�o de diferen�a','Noite','Segunda - Sexta')
('BLABLA 35-GMS','Servi�o','Tarifa��oServi�o','VC3 XXXX YYY-Fixo TIM(on)',3121,1.21,' de convers�o em escada','Noite','Segunda - Sexta')
('BLABLA-GRN','Servi�o','Tarifa��oServi�o','Chamadas',1362,1.961,' de convers�o em escada','Noite','Segunda - Sexta')
('BLABLA-GPE','Servi�o','Tarifa��oServi�o','Chamadas',1362,1.9897,' de convers�o em escada','Dia Todo','Domingos')

Any ideas of how should i do this?
I have one, which will not be very effective... and this file contains more than 16million lines :wall:

Thanks

pravin27 · June 5, 2012, 11:24am

try this.

#!/usr/bin/perl
while (<DATA>) {
chomp;
s/\',(\d+),(\d+),(\d+),\'/\',$1,$2\.$3\,\'/g;
print $_,"\n";
}
__DATA__
('BLABLA +200-GRS','Serviç','TarifaçServiç,'wap.bla.us.0000000121',2985,0,55,' de conversãem escada','Dia','Domingos')
('BLABLA +200-GRR','Serviç','TarifaçServiç,'wap.bla.us.0000000121',2985,0,55,' de conversãem escada','Reduzido','Sádos')
('BLABLA 45-GPR','Serviç','TarifaçServiç,'canais.bla.us.0000000222',3073,0,' de conversãde diferenç,'Noite','Sádos')
('BLABLA 45-GPR','Serviç','TarifaçServiç,'canais.bla.us.0000000222',3073,0,' de conversãde diferenç,'Noite','Segunda - Sexta')
('BLABLA 35-GMS','Serviç,'TarifaçServiç,'VC3 XXXX YYY-Fixo TIM(on)',3121,1,21,' de conversãem escada','Noite','Segunda - Sexta')
('BLABLA-GRN','Serviç,'TarifaçServiç,'Chamadas',1362,1,961,' de conversãem escada','Noite','Segunda - Sexta')
('BLABLA-GPE','Serviç,'TarifaçServiç,'Chamadas',1362,1,9897,' de conversãem escada','Dia Todo','Domingos')

joeyg · June 5, 2012, 11:27am

Could you consider the ' (single quote) character to be the delimiter?
I can get to the data in question, like:

$ awk -F"'" '{print $9}' <sample12.txt | cut -d"," -f3-
0,55,
0,55,
0,
0,
1,21,
1,961,
1,9897,

But then was stuck pondering the 3rd and 4th lines - where that field is zero. Which got me pondering counting number of fields with a , (comma) delimiter.

Skrynesaver · June 5, 2012, 11:35am

Something like the following should do it, but Pravin's solution will run faster and is a clever exploitation of the data format involved.

#!/usr/bin/perl

use strict;
use warnings;

open (DATA , '<', $ARGV[0]);
while(<DATA>){
  my @record=split(/,/,$_);
  splice(@record,5,2,("$record[5].$record[6]"))if (@record > 9);
  print join',',@record;
}

Scrutinizer · June 5, 2012, 12:10pm

Try:

sed "s/,\([^']\)/.\1/3" infile

ddreggors · June 5, 2012, 2:46pm

@Scrutinizer:
Very nice!

I had come up with a similar sed command but not nearly as elegant.

$ sed -r 's/([0-9]+),([0-9]+),([0-9]+),/\1,\2.\3,/g' ./text.txt 
('BLABLA +200-GRS','Serviç','TarifaçServiç,'wap.bla.us.0000000121',2985,0.55,' de conversãem escada','Dia','Domingos')
('BLABLA +200-GRR','Serviç','TarifaçServiç,'wap.bla.us.0000000121',2985,0.55,' de conversãem escada','Reduzido','Sádos')
('BLABLA 45-GPR','Serviç','TarifaçServiç,'canais.bla.us.0000000222',3073,0,' de conversãde diferenç,'Noite','Sádos')
('BLABLA 45-GPR','Serviç','TarifaçServiç,'canais.bla.us.0000000222',3073,0,' de conversãde diferenç,'Noite','Segunda - Sexta')
('BLABLA 35-GMS','Serviç,'TarifaçServiç,'VC3 XXXX YYY-Fixo TIM(on)',3121,1.21,' de conversãem escada','Noite','Segunda - Sexta')
('BLABLA-GRN','Serviç,'TarifaçServiç,'Chamadas',1362,1.961,' de conversãem escada','Noite','Segunda - Sexta')
('BLABLA-GPE','Serviç,'TarifaçServiç,'Chamadas',1362,1.9897,' de conversãem escada','Dia Todo','Domingos')

Mine matches on (and keeps) 3 consecutive number columns and replaces the comma for a period in the appropriate place when found.

You approach is very similar but far more compact and elegant.

Correct me if I am wrong, but I will try to explain for those that do not understand that sed command.

The match side of the command matches on any text that starts with a comma and does not have a single quote after it. Since numbers are not contained in single quotes it will only ever match (and keep) the number fields.

Next he tells the replace side of the sed command to replace the comma and number with a period and number, but only on the 3rd match.

so in plain text:

,11,22,33 would become ,11,22.33 as the first and second commas are left as is!

poliver · June 11, 2012, 10:09am

Thanks for all replies... i used Scrutinizer version and worked great in what i was expecting...