Help with removing embedded linefeeds

Greetings all,

i have csv file with pipe separated columns

SSN|NAME|ADDRESS|FILLER
123|abc|myaddress|xxx
234|BBB|my
add
ress
broken up|yyy

In the example above, the second record is broken into multiple lines. I need to keep going until I find a "|" since this issue is with the non-last column and therefore there definitely will be a pipe at the end of that column text.

is there any way i can remove \n (newline) from the address column?

Thanks for the help.

Try:

awk -F"|" 'NF<4{ORS=" ";p=1}NF==1{p=1}NF==4&&p{printf "\n";ORS="\n";p=0}1' file

meh. bartus11 beat me to it and his is more elegant. :slight_smile:

Thanks guys. I guess as long as I know the number of columns in the record, I can tweak this awk code to strip out linefeeds embedded in columns.

If my number of columns is in a shell variable num_cols, how do I reference that from this awk code?

I guess this should work:

awk -F"|" -vn=$num_cols 'NF<n{ORS=" ";p=1}NF==n{p=1}NF==4&&p{printf "\n";ORS="\n";p=0}1' file

I get

nawk: can't open file NF<n{ORS=" ";p=1}NF==1{p=1}NF==n&&p{printf "\n";ORS="\n";p=0}1

---------- Post updated at 03:23 PM ---------- Previous update was at 03:22 PM ----------

works on command line though

bartus11 - any idea why this code does not remove linefeeds from the very first line?

Can you post sample data that you are using, the output that you are getting and what is desired output?

This is my sample file:

$ cat test3
123|aaa|a1|a2|a5|a6
222|bbb|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

I add line feeds to the 2nd line :
$ cat test3
123|aaa|a1|a2|a5|a6
222|bb
b|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

And the awk works fine:
$nawk -F"|" 'NF<6{ORS=" ";p=1}NF==1{p=1}NF==6&&p{printf "\n";ORS="\n";p=0}1' test3
123|aaa|a1|a2|a5|a6
222|bb b|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

Now I add line feeds to the 1st line as well

$ cat test3
123|aa
a|a1|a2|a5|a6
222|bb
b|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

Now awk does not remove linefeeds from 1st record:

$ nawk -F"|" 'NF<6{ORS=" ";p=1}NF==1{p=1}NF==6&&p{printf "\n";ORS="\n";p=0}1' test3
123|aa a|a1|a2|a5|a6 222|bb b|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

$

Thanks for your help.

Try:

awk -F"|" 'NF<6{ORS=" ";n+=NF;print}n==7{printf "\n";ORS="\n";n=0}NF==6' file

That did not help.

$ cat test3
123|aaa|a1|a2|a5|a6
222|b
b
b|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

$ awk -F"|" 'NF<6{ORS=" ";n+=NF;print}n==7{printf "\n";ORS="\n";n=0}NF==6' test3
123|aaa|a1|a2|a5|a6
222|b b b|b1|b2|b3|b4 333|ccc|c1|c2|c3|c4 444|ddd|d1|d2|d3|d4 $

OK, try this:

awk -F"|" 'NF<6{ORS=" ";n+=(NF-1);print}n==5{printf "\n";ORS="\n";n=0}NF==6' file

yet another awk script to try...

awk -F\| '{l=l?l""$0:$0;if(split(l,a,"|")==6){print l;l=""}}' file

Thanks guys. I will try those options !!!!