I have a text file in which the text has been divided into paragraphs (two line breaks or tab marks a new paragraph) and I want to make a script which output would delete line breaks within the paragraph and the different paragraphs would be separated by two line breaks.
So, if my input file is:
The first line.
Second line.
First line of the second paragraph.
Second line of the second paragraph.
I want the output to be something like:
The first line. Second line.
First line of the second paragraph. Second line of the second paragraph.
I have tried now for some hours to come up with something reasonable, but I seem to be heading the wrong way. I would be really pleased if someone gave their idea of how to solve the problem.
# nawk 'NR%3 {printf "%s ", $0;next}1(NR+1)%3{print"\n"}' infile
The first line. Second line.
First line of the second paragraph. Second line of the second paragraph.
The output of Tytalus' code is pretty much what needed, but yes, I want it to work with paragraphs longer than 2 lines as well. But I really didn't understand the code itself very well to change it properly. Could someone explain it a bit or give ideas of how to change it?
$ nawk 'BEGIN {FS=RS=""; ORS="\n\n\n"} $1=$1' file
T h e f i r s t l i n e .
S e c o n d l i n e .
F i r s t l i n e o f t h e s e c o n d p a r a g r a p h .
S e c o n d l i n e o f t h e s e c o n d p a r a g r a p h .
Set the record separators to 2 newlines:
$ awk 'BEGIN {RS=ORS="\n\n"} $1=$1' file
The first line. Second line.
First line of the second paragraph. Second line of the second paragraph.
Thanks Franklin52, the code you gave works just fine, except there is one problem. I want it to make a new paragraph when tab is used as well.
So if the input file is:
First line of the second paragraph.
Second line of the second paragraph.
Third as well.
The output is:
First line of the second paragraph. Second line of the second paragraph.
Third as well.
I tried adding "\t" to RS in addition to the "\n\n". Is that the right thing to do anyway? Agaiss it isn't, cause if I replaced the "\n\n" with "\t" then it should have made a new paragraph to my mind, but it didn't. So any further assistance is greatly appreciated.
$ cat file
The first line.
Second line.
First line of the second paragraph.
Second line of the second paragraph.
Third as well.
$
$ tr '\t' '\n' < file | awk 'BEGIN {RS=ORS="\n\n"} $1=$1'
The first line. Second line.
First line of the second paragraph. Second line of the second paragraph.
Third as well
You already have it entirely in AWK
Just remove the tr part:
awk 'BEGIN{RS=ORS="\n\n"}$1=$1' infile
Edit: Consider that most AWK implementations do not support multiple characters for RS.
If the AWK code provided by Franklin52 is working for you, you should be using GNU AWK or tawk.
I have a question. My script is like this at the moment.
#!/usr/bin/awk -f
BEGIN {RS=ORS="\n\n"}
//{
gsub("\t", "\n")
}
$1=$1
END {}
Why won�t if make a new line? It recognizes the tabs and if I replaced it let`s say with the letter "a" then it worked. But why wont it make a new line in this case?
As I understand it is somehow in conflict with the $1=$1 part. Cause when I did gsub to the input file and just printed it out, then it worked.