Unix Linux Community

Help with removing embedded linefeeds

Shell Programming and Scripting

stayalive July 15, 2011, 12:39pm 1

Greetings all,

i have csv file with pipe separated columns

SSN|NAME|ADDRESS|FILLER
123|abc|myaddress|xxx
234|BBB|my
add
ress
broken up|yyy

In the example above, the second record is broken into multiple lines. I need to keep going until I find a "|" since this issue is with the non-last column and therefore there definitely will be a pipe at the end of that column text.

is there any way i can remove \n (newline) from the address column?

Thanks for the help.

bartus11 July 15, 2011, 12:47pm 2

Try:

awk -F"|" 'NF<4{ORS=" ";p=1}NF==1{p=1}NF==4&&p{printf "\n";ORS="\n";p=0}1' file

neutronscott July 15, 2011, 12:56pm 3

meh. bartus11 beat me to it and his is more elegant.

stayalive July 15, 2011, 1:30pm 4

Thanks guys. I guess as long as I know the number of columns in the record, I can tweak this awk code to strip out linefeeds embedded in columns.

stayalive July 25, 2011, 2:26pm 5

If my number of columns is in a shell variable num_cols, how do I reference that from this awk code?

bartus11 July 25, 2011, 2:59pm 6

I guess this should work:

awk -F"|" -vn=$num_cols 'NF<n{ORS=" ";p=1}NF==n{p=1}NF==4&&p{printf "\n";ORS="\n";p=0}1' file

stayalive July 25, 2011, 3:23pm 7

I get

nawk: can't open file NF<n{ORS=" ";p=1}NF==1{p=1}NF==n&&p{printf "\n";ORS="\n";p=0}1

---------- Post updated at 03:23 PM ---------- Previous update was at 03:22 PM ----------

works on command line though

stayalive July 27, 2011, 11:20am 8

bartus11 - any idea why this code does not remove linefeeds from the very first line?

bartus11 July 27, 2011, 11:35am 9

Can you post sample data that you are using, the output that you are getting and what is desired output?

stayalive July 27, 2011, 11:51am 10

This is my sample file:

$ cat test3
123|aaa|a1|a2|a5|a6
222|bbb|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

I add line feeds to the 2nd line :
$ cat test3
123|aaa|a1|a2|a5|a6
222|bb
b|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

And the awk works fine:
$nawk -F"|" 'NF<6{ORS=" ";p=1}NF==1{p=1}NF==6&&p{printf "\n";ORS="\n";p=0}1' test3
123|aaa|a1|a2|a5|a6
222|bb b|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

Now I add line feeds to the 1st line as well

$ cat test3
123|aa
a|a1|a2|a5|a6
222|bb
b|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

Now awk does not remove linefeeds from 1st record:

$ nawk -F"|" 'NF<6{ORS=" ";p=1}NF==1{p=1}NF==6&&p{printf "\n";ORS="\n";p=0}1' test3
123|aa a|a1|a2|a5|a6 222|bb b|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

$

Thanks for your help.

bartus11 July 27, 2011, 2:02pm 11

Try:

awk -F"|" 'NF<6{ORS=" ";n+=NF;print}n==7{printf "\n";ORS="\n";n=0}NF==6' file

stayalive July 27, 2011, 2:37pm 12

That did not help.

$ cat test3
123|aaa|a1|a2|a5|a6
222|b
b
b|b1|b2|b3|b4
333|ccc|c1|c2|c3|c4
444|ddd|d1|d2|d3|d4

$ awk -F"|" 'NF<6{ORS=" ";n+=NF;print}n==7{printf "\n";ORS="\n";n=0}NF==6' test3
123|aaa|a1|a2|a5|a6
222|b b b|b1|b2|b3|b4 333|ccc|c1|c2|c3|c4 444|ddd|d1|d2|d3|d4 $

bartus11 July 27, 2011, 2:50pm 13

OK, try this:

awk -F"|" 'NF<6{ORS=" ";n+=(NF-1);print}n==5{printf "\n";ORS="\n";n=0}NF==6' file

shamrock July 27, 2011, 3:34pm 14

yet another awk script to try...

awk -F\| '{l=l?l""$0:$0;if(split(l,a,"|")==6){print l;l=""}}' file

stayalive July 27, 2011, 3:39pm 15

Thanks guys. I will try those options !!!!