re-Substitution Sed (or Perl)

beenny · August 3, 2011, 6:33pm

I have a large text csv file that I'm working with. It will look something like this:

D,",E",C
O,"F,",I
O,gh,R

The second column always has a two digit random code (can be numbers, letters or any characters). When one of the characters happens to be a comma, the string is quoted. I want to find a way (within the context of a larger shell script i'm writing) to change that around. Ideally I would get rid of the quotes and change the comma to some other character - lets say a double colon ::.

So the output would look like:

D,::E,C
O,F::,I
O,gh,R

I can do something like this:

sed -i 's/"*."/::/g' myfile

(or the equivlant in perl - i find sed operates faster on big files).

but that obviously doesnt get me what i want. The question is i want to re-substitute in the other character do you have any thoughts on how to do that?

Thanks,

bg

yazu · August 3, 2011, 9:45pm

When you are working with big and weird CSV file the best way (and I believe the only good way) is to use a good csv library. Text::CSV for perl from CPAN is an excellent choice.

karthik3152 · August 4, 2011, 2:21am

Use

sed -i -e  s/\"(.*)\,(.* )\"/\1::\2/g

I think this will work

beenny · August 4, 2011, 3:09pm

Thanks for the reply.

When I run that I get the following error:

]$ sed -i -e s/\"(.*)\,(.* )\"/\1::\2/g myfile.txt
bash: syntax error near unexpected token `('

However when I try it in perl:

$perl -pe 's/\"(.*)\,(.*)\"/$1::$2/g' myfile.txt

it works fine - so thanks!

any reason why the sed version doesn't work? just trying to learn more about scripting. the passing the matched string as a variable is a trick i wasn't familiar with and is great to know...

thanks,

bg

dragon.1431 · August 4, 2011, 3:57pm

minor correction:

sed -e 's/"\(.*\),\(.*\)"/\1::\2/g' myfile

birei · August 4, 2011, 4:54pm

Because 'sed' is different of 'perl' for brackets:

perl -> (...)
sed -> \(...\)

Regards,
Birei