Removing line breaks inside a field

Bobby_2000 · August 1, 2017, 12:08pm

Hi all,

I have a csv input file with total 60 fields and the fields are not enclosed with double quotes.One of the field(50th field) in this file has line breaks in it which results in the row getting split into multiple lines.This is causing my load(to table) to fail.I tried to enforce double quotes to this field using regular expression.This worked well for most of the rows but this didn't work for some of them.I am unable to find the reason for this issue.The command i used is:

cat input.csv | tr -d '\r' | tr '\n' '�' | sed -E 's/(�([^,]*,){49})([^",]+),/\1"\3",/g' | tr '�' '\n' > output.csv

Can someone please give me the command to remove all the line breaks in this field?

RudiC · August 1, 2017, 1:10pm

Please become accustomed to provide decent context info of your problem.
It is always helpful to support a request with system info like OS and shell, related environment (variables, options), preferred tools, adequate (representative) sample input and desired output data and the logics connecting the two, and, if existent, system (error) messages verbatim, to avoid ambiguities and keep people from guessing.

This is one of the prevalent problems in these forums - did you try searching for solutions? One approach would be to read / append lines until the field count is correct.

Don_Cragun · August 1, 2017, 1:26pm

Hi Bobby_2000,
Expanding a little bit on what RudiC has already said...

How big are the files you're trying to process?

What operating system are you using?

What output do you get from the following command?

getconf LINE_MAX

(Note that sed is only specified to work on text files and you are turning your input files into a single, partial line to be processed by sed . By definition, a text file can't have any lines with more bytes than the number printed by the above command and each line has to have a <newline> character line terminator. Some versions of sed will let you get by with some input files that have long lines, missing line terminators, or both; others won't.)

Please show us some sample input that produces output that doesn't match what you want (in CODE tags), show us the output you get with the pipeline you showed us in post #1 (in CODE tags) with that sample input, and show us the output you want (also in CODE tags) from that sample input.

And, please show us any diagnostics produced by your pipeline exactly as they are printed (also in CODE tags) if there are any.

rdrtx1 · August 1, 2017, 1:42pm

awk '/,/ {if (e) print e; if (NR>1 && !e) print ""; printf $0; e=""} ! /,/ {e=e $0} END {print e}' infile