Complicated string

I have a data file example

10302|77747373|442422442|290209|244242|"234|2352"|92482892

It has about 5000 rows the same way in that field.

Needs to look like this.... I need to remove the quotes but the more difficult thing is to remove the pipe between the quotes because there is a pipe in between? Any ideas?

10302|77747373|442422442|290209|244242|2342352|92482892

Try:

awk '{sub(/[|]/,x,$2)}1' FS=\" OFS= file

or something like:

sed 's/"\([^|]*\)|\([^"]*\)"/\1\2/g' file

Note the two solutions (awk and sed) are subtly different. Particularly in how they deal with quoted strings that don't contain a pipe.

The sed version only removes the quotes (and pipe) if they contain a pipe and does this for all instances on the line (g option), where the awk removes the first set of quotes on the line (and pipe if present), regardless of them containing a pipe.

Not quite. The awk only removes the first set of quotes, IF they contain a pipe symbol. This was done, since the input specification says there is only one field per line. If there are more sets of quotes on the line, then they are also going to be removed, but if that is the case then we also need a different solution.

With the sed at first I left out the g -flag, so that it did the same, but decided to put it in so that if the OP's input file was a little bit different than stated (namely only one field with quotes) he could see the difference and report back...

1 Like

Interesting, I didn't know sub worked that way, in contrast to this awk '{$2=$2}1' FS=\" OFS= file

It is not sub that works this way, but rather it is that $2 only gets changed by sub when the substitution is succesful. If not, then the record does not get recalculated and thus the field separators (the double quotes) remain unchanged.

Yes, output is not written unless the RE matches. For some reason I thought it was equivalent to this $2=gensub(/[|]/,x,"",$2)