How to remove line break in between column of csv file

saurabh29may · June 25, 2020, 2:40pm

below is the csv file which has a line break in between one column

File Format ; comma delimited
test enclosed with double quotes

"-658924041","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09a","DD3582AF" 
"-658924042","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09b
**","DD3582AF"**
"-658924043","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF" 
"-658924044","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09d","DD3582AF"

Above second rows got line break , which is actually a complete row, next line start with this is 658924043

but I need the output as below in order to load the data,

loading data using SQL loader

"-658924041","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09a","DD3582AF" 
**"-658924042","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09b","DD3582AF"**
"-658924043","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF" 
"-658924044","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09d","DD3582AF"

I have tried as below

sed -e "s/\r//g" file_name.csv. but nothing happening.

file outpput
$ cat -vet myfile.csv "-658924041","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF" $ "-658924042","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C$ ","DD3582AF" $ "-658924043","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF" $ "-658924044","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF" $

vgersh99 · June 25, 2020, 2:45pm

As in the other forum please try to format your question(s) using proper markdown tags. Otherwise you'll be getting the same/similar followup questions at best or get no traction at all.
Also provide a couple of lines of output of cat -vet myFile.csv.

MadeInGermany · June 25, 2020, 6:15pm

A simple concatenation of the following line if the " are not in pairs:

awk -F'"' '(NF % 2)==0 { getline x; $0=($0 x) } 1' myfile.csv

saurabh29may · June 25, 2020, 6:19pm

Thanks a lot. its working

vgersh99 · June 25, 2020, 6:34pm

This is a very specific solution where the lines get broken down by the double quotes AND have only TWO lines to be concatenated.
It might serve the purpose of the OP's scenario, but it's not going to be sufficient for all the potential cases.

vgersh99 · June 25, 2020, 8:38pm

a bit of a different approach inspired by this post - can definitely be improved.
It has no built-in assumptions (other than you have gawk version 4.0++) and/or figures out what the complete line's number of fields should be and can have multiple lines to be joined.
Also CSV fields can have embedded commas.

gawk -f sau.awk myFile.csv

where sau.awk is:

BEGIN {
   FPAT="[^,]*|(\"[^\"]*\")+"
   ARGV[ARGC++]=ARGV[1]
}
FNR==NR {
   nf=(NF>nf)?NF:nf
   next
}

NF==nf{print;next}
{
  if (n) {
    buff=buff $0
    n+=NF-1
  }
  else {
     buff=$0
     n=NF
  }
}
n>=nf {
   print buff
   buff="";n=0
}

given modified sample csv myFile.csv file:

"-658924041","08/06/2020,11:46:06","kjljjlkjjl","3","EF1B09a","DD3582AF"
"-658924042","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09b
","DD3582AF"
"-658924043","08/06/2020,11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF"
"-658924044",
"08/06/2020 11:46:06","kjljjlkjjl","3
","EF1B09d","DD3582AF"

produces:

"-658924041","08/06/2020,11:46:06","kjljjlkjjl","3","EF1B09a","DD3582AF"
"-658924042","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09b","DD3582AF"
"-658924043","08/06/2020,11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF"
"-658924044","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09d","DD3582AF"

MadeInGermany · June 26, 2020, 2:07pm

Based on my first post, I change

the if to a while, to allow to concatenate more than one line
add a NF<pnf condition, to detect a line break outside a "field"

awk -F'"' '{ while (((NF % 2)==0 || NF<pnf) && getline x) { $0=($0 x) } print; pnf=NF }' myFile.csv

Also have added the getline to the condition, to safely avoid an endless loop.
Now it should digest almost everything.

system · August 25, 2020, 2:17pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.