AWK scripting

I have a text file in which the text has been divided into paragraphs (two line breaks or tab marks a new paragraph) and I want to make a script which output would delete line breaks within the paragraph and the different paragraphs would be separated by two line breaks.

So, if my input file is:

     The first line.
Second line.

First line of the second paragraph.
Second line of the second paragraph.

I want the output to be something like:

The first line. Second line.

First line of the second paragraph. Second line of the second paragraph.

I have tried now for some hours to come up with something reasonable, but I seem to be heading the wrong way. I would be really pleased if someone gave their idea of how to solve the problem.

Thanks!

#  paste - - - <infile | sed G
     The first line.    Second line.

First line of the second paragraph.     Second line of the second paragraph.

or in nawk:

# nawk 'NR%3 {printf "%s ", $0;next}1(NR+1)%3{print"\n"}' infile
     The first line. Second line.

First line of the second paragraph. Second line of the second paragraph.

And what if the paragraph is more than 2 lines long?

nawk 'BEGIN {FS=RS=""; ORS="\n\n\n"} $1=$1' infile

The output of Tytalus' code is pretty much what needed, but yes, I want it to work with paragraphs longer than 2 lines as well. But I really didn't understand the code itself very well to change it properly. Could someone explain it a bit or give ideas of how to change it?

Have you tried my suggestion?

vgersh99, with your solution I get this:

$ nawk 'BEGIN {FS=RS=""; ORS="\n\n\n"} $1=$1' file
          T h e   f i r s t   l i n e .
 S e c o n d   l i n e .


F i r s t   l i n e   o f   t h e   s e c o n d   p a r a g r a p h .
 S e c o n d   l i n e   o f   t h e   s e c o n d   p a r a g r a p h .


Set the record separators to 2 newlines:

$ awk 'BEGIN {RS=ORS="\n\n"} $1=$1' file
The first line. Second line.

First line of the second paragraph. Second line of the second paragraph.

Regards

Thanks Franklin52, the code you gave works just fine, except there is one problem. I want it to make a new paragraph when tab is used as well.

So if the input file is:

First line of the second paragraph.
Second line of the second paragraph.
     Third as well.

The output is:

First line of the second paragraph. Second line of the second paragraph.
     
Third as well.

I tried adding "\t" to RS in addition to the "\n\n". Is that the right thing to do anyway? Agaiss it isn't, cause if I replaced the "\n\n" with "\t" then it should have made a new paragraph to my mind, but it didn't. So any further assistance is greatly appreciated.

Thanks in advance!

You can left the awk code unaltered. Translate the tabs to newlines with tr and pipe the output to the awk command:

tr '\t' '\n' < file | awk 'BEGIN {RS=ORS="\n\n"} $1=$1'

This is what I get:

$ cat file
    The first line.
Second line.

First line of the second paragraph.
Second line of the second paragraph.
    Third as well.
$
$ tr '\t' '\n' < file | awk 'BEGIN {RS=ORS="\n\n"} $1=$1'
The first line. Second line.

First line of the second paragraph. Second line of the second paragraph.

Third as well

Regards

Can someone explain the '$1=$1' part of the script? What exactly is it doing?

A trick to force awk to remove whitespaces ( and non record separators) and rearrange the line in the buffer ($0) with the new ORS.

Regards

Hi i think perl is a little bit easier than awk

$/="\n\n";
open FH,"<a.txt";
while(<FH>){
	tr/\n//d;
	print $_,"\n";
}

Thanks alot Franklin52! Works just fine :slight_smile:

But I was wondering whether it`s possible to do it entirely int AWK?

You already have it entirely in AWK :slight_smile:
Just remove the tr part:

awk 'BEGIN{RS=ORS="\n\n"}$1=$1' infile

Edit: Consider that most AWK implementations do not support multiple characters for RS.
If the AWK code provided by Franklin52 is working for you, you should be using GNU AWK or tawk.

Or, if Perl is acceptable:

perl -00ple'tr/\t//d;tr/\n/ /' infile

I have a question. My script is like this at the moment.

#!/usr/bin/awk -f

BEGIN {RS=ORS="\n\n"}


//{
      gsub("\t", "\n")
}

$1=$1


END {}

Why won�t if make a new line? It recognizes the tabs and if I replaced it let`s say with the letter "a" then it worked. But why wont it make a new line in this case?
As I understand it is somehow in conflict with the $1=$1 part. Cause when I did gsub to the input file and just printed it out, then it worked.

Any suggestions of how to fix it?

Thanks!