Conditional replacement of a delimiter

Hello,

I'm new to this forum but this seems like the place to ask this question.

I have a pipe delimited data file with the fields except for the header being encased in double quotes. I found out that some of the fields have an trash pipe within the data itself. I'd like to conditionally remove any pipe that is not a field delimiter.

Here is some sample data

Email|DateCreated|FirstName|LastName|PostCode|Address_line1|Address_line2|City
"sample@email.com"|"2007/11/12 17:57:04"|"Keith A| Glass"|""|"20110"|""|""|""
"sample@verizo|n.net"|"2007/11/12 18:18:56"|"ESPN"|"Mi||er"|"06/25/1955"|""|""|"NY"
"sample@ao|.com"|"2007/11/12 18:20:25"|"Mary"|"Shelley"|"05/16/1845"|"James St.||Apt. 1"|""|"CT"
"samp|e@diagnostics.com"|"2007/11/12 18:20:28"|"Jam|es"|"Sample"|""|"110 Perkins Street||Apt. 6"|""|""

As you can see the header has the pipe delimited, and those are fine. But those pipes that are within the field are shifting the data when I load into the database.

Thank you for your help.
Samah

if all your valid fields are quoted, than you could do this:

sed 's/"|"/","/g' data | sed '2,$ s/|//g'

The first sed will change all delimiting pipes into commas (all char-triplets "|" into triplets ",", then all pipes you are left with are the messy ones. Piping it again to sed to get rid of them (or you could replace them with some other character, if you wished). Second sed command operates onlines 2 until end-of-file, to keep the header intact.

Try this:

awk -F\" '{for(i=2;i<NF;i+=2)gsub("\|",x,$i)}1' OFS=\" file

Or try..

sed 's/"\([^|"]\+\)|\+\([^|"]\+\)"/"\1\2"/g' inputfile > outfile

Thank you everyone for the help. All of those worked very nicely.