CSV: Replacing multiple occurrences inside a pattern

Greatings all,

I am coming to seek your knowledge and some help on an issue I can not currently get over. I have been searching the boards but did not find anything close to this matter I am struggling with.
I am trying to clean a CSV file and make it loadable for my SQL*Loader. My problem currently is being able to replace multiple occurrences of a pattern inside a pattern ?

Imagine a CSV file with:

  • Endline: \n
  • Field: ;
  • Surrounder "

But in which:

  • Surrounders are not escaped: any field with ; \n or " in it is simply surrounded by double-quote without escaping any other double-quotes in the data.
  • Data can have \r \r\n and \n ; and " characters (enjoy ...)

To make things simple I have:

  • Replaced all \n \r\n \n with || (to remove any notion of line while cleaning the file)
  • Replaced double quotes by doubled double-quotes " => "" then sed back ""; and ;"" to "; and ;" (ignoring the case of "; ;" in the data i agree but nvm) so that now all data double quotes are escaped.

BUT i cannot find a way to replace the \n \r \r\n that were in the data by \n. To do this I need to replace all occurrences of || (that were initially "\n \r \r\n") inside ;"(.)"; by \n. Ideally I need to find all occurrences of the ;"(.)"; inside my file (treated as one whole line since I removed \n and stuff) and within the (.*) replace any matching occurrences of || by \n.

I have tried SED but I fail fo convert only || located between the comma/dblquote pattern. Any idea ?

I hope I have been clear enough, though I think that I may not have. Feel free to ask for some further details if needed.

Regards,

Hope this is what you were getting at:

$ cat sedtest
outside ;"( some || lines || inside )";  outside again
 
$ sed '/\;"(/,/)"\;/s:||:\n:g' sedtest
outside ;"( some 
 lines 
 inside )";  outside again