Greatings all,
I am coming to seek your knowledge and some help on an issue I can not currently get over. I have been searching the boards but did not find anything close to this matter I am struggling with.
I am trying to clean a CSV file and make it loadable for my SQL*Loader. My problem currently is being able to replace multiple occurrences of a pattern inside a pattern ?
Imagine a CSV file with:
- Endline: \n
- Field: ;
- Surrounder "
But in which:
- Surrounders are not escaped: any field with ; \n or " in it is simply surrounded by double-quote without escaping any other double-quotes in the data.
- Data can have \r \r\n and \n ; and " characters (enjoy ...)
To make things simple I have:
- Replaced all \n \r\n \n with || (to remove any notion of line while cleaning the file)
- Replaced double quotes by doubled double-quotes " => "" then sed back ""; and ;"" to "; and ;" (ignoring the case of "; ;" in the data i agree but nvm) so that now all data double quotes are escaped.
BUT i cannot find a way to replace the \n \r \r\n that were in the data by \n. To do this I need to replace all occurrences of || (that were initially "\n \r \r\n") inside ;"(.)"; by \n. Ideally I need to find all occurrences of the ;"(.)"; inside my file (treated as one whole line since I removed \n and stuff) and within the (.*) replace any matching occurrences of || by \n.
I have tried SED but I fail fo convert only || located between the comma/dblquote pattern. Any idea ?
I hope I have been clear enough, though I think that I may not have. Feel free to ask for some further details if needed.
Regards,