Remove newline in middle of string

kinkichin · March 7, 2013, 11:47am

my file input is with tab as delimiter, and in every line, there would be a skip of line with an unexcepted newline breaker. I'd like to remove this \n and put the information in the same line.

INPUT
a1 b1b2 c1
c2 d1
a2 b3 c3 d4

OUTPUT
a1 b1b2 c1c2 d1
a2 b3 c3 d4

Bests regards

Thanks lot for your help

Corona688 · March 7, 2013, 11:50am

awk '{ L=$0 ; getline ; print L $0 }' FS="\t" OFS="\t" inputfile

risham · March 7, 2013, 11:53am

I have a similar doubt
i have my input file as follows:
>some lines of text
actgtg
aaactgtg
acgtcg
>some lines of text
acgtgc
agtcgt
ttgcgt
etc..etc

i want the output as
>some lines of text
actgtgaaactgtgacgtcg
>some lines of text
acgtgcagtcgtttgcgt

basically I want to remove the new line characters at the end of lines which are not starting with '>'. I tried sed '!/>/s/\n//' but to no avail. any help would be highly appreciated!

Corona688 · March 7, 2013, 12:31pm

sed does not match across lines that way, that method of matching only gets single lines.

awk '/>/ { if(L) print substr(L,2); print; L=""; next } { L=L"\n"$0 } END { if(L) print substr(L,2) }' inputfile

alister · March 7, 2013, 3:49pm

Regarding post #3, what follows ssumes that there are no blank lines in the original data.

A different tack, which uses AWK to massage the format so that a second AWK can leverage its multiline record handling capability (which simplifies the logic):

awk '/^>/{print ""}1' file | awk '{print $1; $1=""; print}' OFS= RS=

Regards,
Alister

Yoda · March 7, 2013, 5:15pm

Another approach:

awk '/^>/{$0=(NR>1)?RS $0:$0;ORS=RS}!/>/{ORS=""}END{printf "\n"}1' file

risham · March 8, 2013, 12:20am

Thanks guys for the code n yeah for correcting my interpretation of sed! somehow the third answer seems to be working with my requirements..havnt really used awk in my work earlier..so explanations of the codes stated above would really help me learn something!

Thanks again!!

Yoda · March 8, 2013, 12:36am

Explanation:

awk '
/^>/ {                                  # If current record starts with > ( /^</ )
        $0 = (NR > 1 ? RS $0 : $0)      # If current record number is greater 1 (NR > 1) set it to newline followed by current record (RS $0)
        ORS = RS                        # Set Output Record Separator to Record Separator (ORS = RS) [ RS is newline by default ]
}

! />/ {                                 # If current record does not contains pattern > ( !/>/ )
        ORS = ""                        # Set Output Record Separator to "" (ORS = "")
}

END {                                   # END Block
        printf "\n"                     # Print newline
} 1                                     # 1 == true, so print current record
' file

Note: ORS, RS, NR are special variables in awk . Please check the awk manual pages for further reference. I hope this helps.

risham · March 8, 2013, 12:40am

Bravo!! Thanks a ton again!!

anbu23 · March 8, 2013, 1:33am

$ sed -n "/^>/{x;s/\n//2gp;}; /^>/! {H;}; $ {x;s/\n//2gp;};" file
>some lines of text
actgtgaaactgtgacgtcg
>some lines of text
acgtgcagtcgtttgcgt