How to remove line break in between column of csv file

below is the csv file which has a line break in between one column

File Format ; comma delimited
test enclosed with double quotes

"-658924041","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09a","DD3582AF" 
"-658924042","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09b
**","DD3582AF"**
"-658924043","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF" 
"-658924044","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09d","DD3582AF"

Above second rows got line break , which is actually a complete row, next line start with this is 658924043

but I need the output as below in order to load the data,

loading data using SQL loader

"-658924041","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09a","DD3582AF" 
**"-658924042","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09b","DD3582AF"**
"-658924043","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF" 
"-658924044","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09d","DD3582AF"

I have tried as below

sed -e "s/\r//g" file_name.csv. but nothing happening.

file outpput
$ cat -vet myfile.csv "-658924041","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF" $ "-658924042","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C$ ","DD3582AF" $ "-658924043","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF" $ "-658924044","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF" $

As in the other forum please try to format your question(s) using proper markdown tags. Otherwise you'll be getting the same/similar followup questions at best or get no traction at all.
Also provide a couple of lines of output of cat -vet myFile.csv.

1 Like

A simple concatenation of the following line if the " are not in pairs:

awk -F'"' '(NF % 2)==0 { getline x; $0=($0 x) } 1' myfile.csv
2 Likes

Thanks a lot. its working

This is a very specific solution where the lines get broken down by the double quotes AND have only TWO lines to be concatenated.
It might serve the purpose of the OP's scenario, but it's not going to be sufficient for all the potential cases.

2 Likes

a bit of a different approach inspired by this post - can definitely be improved.
It has no built-in assumptions (other than you have gawk version 4.0++) and/or figures out what the complete line's number of fields should be and can have multiple lines to be joined.
Also CSV fields can have embedded commas.

gawk -f sau.awk myFile.csv

where sau.awk is:

BEGIN {
   FPAT="[^,]*|(\"[^\"]*\")+"
   ARGV[ARGC++]=ARGV[1]
}
FNR==NR {
   nf=(NF>nf)?NF:nf
   next
}

NF==nf{print;next}
{
  if (n) {
    buff=buff $0
    n+=NF-1
  }
  else {
     buff=$0
     n=NF
  }
}
n>=nf {
   print buff
   buff="";n=0
}

given modified sample csv myFile.csv file:

"-658924041","08/06/2020,11:46:06","kjljjlkjjl","3","EF1B09a","DD3582AF"
"-658924042","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09b
","DD3582AF"
"-658924043","08/06/2020,11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF"
"-658924044",
"08/06/2020 11:46:06","kjljjlkjjl","3
","EF1B09d","DD3582AF"

produces:

"-658924041","08/06/2020,11:46:06","kjljjlkjjl","3","EF1B09a","DD3582AF"
"-658924042","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09b","DD3582AF"
"-658924043","08/06/2020,11:46:06","kjljjlkjjl","3","EF1B09C","DD3582AF"
"-658924044","08/06/2020 11:46:06","kjljjlkjjl","3","EF1B09d","DD3582AF"
3 Likes

Based on my first post, I change

  • the if to a while, to allow to concatenate more than one line
  • add a NF<pnf condition, to detect a line break outside a "field"
awk -F'"' '{ while (((NF % 2)==0 || NF<pnf) && getline x) { $0=($0 x) } print; pnf=NF }' myFile.csv

Also have added the getline to the condition, to safely avoid an endless loop.
Now it should digest almost everything.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.