I have a large text csv file that I'm working with. It will look something like this:
D,",E",C
O,"F,",I
O,gh,R
The second column always has a two digit random code (can be numbers, letters or any characters). When one of the characters happens to be a comma, the string is quoted. I want to find a way (within the context of a larger shell script i'm writing) to change that around. Ideally I would get rid of the quotes and change the comma to some other character - lets say a double colon ::.
So the output would look like:
D,::E,C
O,F::,I
O,gh,R
I can do something like this:
sed -i 's/"*."/::/g' myfile
(or the equivlant in perl - i find sed operates faster on big files).
but that obviously doesnt get me what i want. The question is i want to re-substitute in the other character do you have any thoughts on how to do that?
When you are working with big and weird CSV file the best way (and I believe the only good way) is to use a good csv library. Text::CSV for perl from CPAN is an excellent choice.
]$ sed -i -e s/\"(.*)\,(.* )\"/\1::\2/g myfile.txt
bash: syntax error near unexpected token `('
However when I try it in perl:
$perl -pe 's/\"(.*)\,(.*)\"/$1::$2/g' myfile.txt
it works fine - so thanks!
any reason why the sed version doesn't work? just trying to learn more about scripting. the passing the matched string as a variable is a trick i wasn't familiar with and is great to know...