Hi,
Do anybody experience how to concatenate multiple line into one line by using awk or perl command?
Input file:
>set1
QAWEQRQ@EWQEASED
ASDAEQW
QAWEQRQTQ
ASRFQWRGWQ
From the above Input file, it got 5 lines
Desired output file:
>set1
QAWEQRQ@EWQEASEDASDAEQWQAWEQRQTQASRFQWRGWQ
I hope to concatenate all the line exclude ">" into a line.
It means at the desired output file, it only can contain 2 line. First line is a line with ">" and another line is concatenate multiple line into one long single line.
Thanks for any advice.
An awk:
awk '{ORS=/^>/?"\n":"";print}' infile
If you have a multiple set file (and adding last rc as @RudiC proposes):
awk '{ORS=sub(/^>/,"\n>")?"\n":"";print}END{print "\n"}' infile
RudiC
May 12, 2014, 11:45am
3
To make that a correct *nix text file by adding a new line char at the end, try:
awk '{ORS=/^>/?"\n":"";print} END{printf "\n"}' file
Try deleting the new line with tr:-
tr -d "\n" < infile > outfile
It works for me, but of course there is no new-line at the end, so:-
RBATTE1> cat outfile
EWQEASEDASDAEQWQAWEQRQTQASRFQWRGWQRBATTE1>
Robin
Yoda
May 12, 2014, 11:50am
5
Another approach:
awk '
/>/ {
$0 = ( NR == 1 ? $0 : RS $0 RS )
print
}
!/>/ {
ORS = ""
print
}
END {
print "\n"
}
' file
Nice & clean @Yoda , just one point i .. i would use next to avoid the second matching this way:
awk '
/>/ {
$0 = ( NR == 1 ? $0 : RS $0 RS )
print
next
}
{
ORS = ""
print
}
END {
print "\n"
}
' infile
1 Like
Another one:
awk '{$1=RS $1 ORS}NR>1' OFS= RS=\> file
(As long as there are no extra ">" in the ">" headers..)
--
otherwise
awk '/^>/{if(NR>1)print RS; $1=$1 RS}1 END{print RS}' ORS= file