Help with text processing

bikerboy · February 25, 2015, 4:15pm

I have an Input file which has a series of lines(which could vary) followed by two blank lines and then another series of lines(Could be any number of lines) followed by two blank lines and then repeats. I need to use filters to convert the following input file(which is an example) to an output file as shown below. The output should be - first three lines separated by a comma character or "|" character and then the rest of lines should be appended as shown below in the output.

Input file:

a
b
c
d
e
f


p
q
r
s
t
u
v
w
x

Output :

a|b|c|d
a|b|c|e
a|b|c|f
p|q|r|s
p|q|r|t
p|q|r|u
p|q|r|v
p|q|r|w
p|q|r|x

Appreciate your time and interest. Thank you,
BikerBoy

Don_Cragun · February 25, 2015, 7:05pm

Is this a homework assignment? If so, please refile in the Homework & Coursework Questions forum following the directions specified here.

If not, what have you tried?

Can there be single blank lines? If so, how are they to be handled?

Can there be more than two adjacent blank lines? If so, how are they to be handled?

What is supposed to happen if there are less than four lines between sets of blank lines?

bikerboy · February 25, 2015, 7:21pm

Thank you for the reply Don.

This is not a homework assignment.

I have tried using a couple of awk and sed filters but the effort went in vain.

There are always two blank lines. Not less or more.

There are always more than 4 or more lines in between two blank lines.

Don_Cragun · February 25, 2015, 7:37pm

Try:

awk '
NF == 0 {
	c = 0
	out = ""
	next
}
c++ < 3 {	
	out = out $1 "|"
	next
}
{	print out $1
}' file

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

bikerboy · February 25, 2015, 8:19pm

Super!. Big thanks to you Sir!!!.

One last question before I close this thread. What if the input is like this:

hi
this
is
biker boy
i like bikes
i love bikes

this
is
a
great forum
with
helpful people around
having immense knowledge
in unix commands
and scripting.

this|is|a|great forum
this|is|a|with
this|is|a|helpful people around
this|is|a|having immense knowledge
this|is|a|in unix commands
this|is|a|and scripting.

I really appreciate your effort.
Thank you.

---------- Post updated at 08:19 PM ---------- Previous update was at 08:17 PM ----------

Actually never mind. I figured it out.

Thank you so much!!!

Don_Cragun · February 25, 2015, 8:54pm

I'm glad that you figured it out.

Note that if the sample input you provide is more representative of the data you'll be processing, you'll get better suggestions for ways to get the output you want.

To help others who may read this thread in the future, it would be nice if you would post the script that you used to solve your expanded requirements.

bikerboy · February 25, 2015, 9:10pm

Noted!. This is my first post so I am learning to follow the rules and regulations of this forum.

The syntax I ran to execute the second input format is :

awk '
NF == 0 {
c = 0
out = ""
next
}
c++ < 3 {
out = out $n "|"
next
}
{print out $n}' file.txt

Don_Cragun · February 25, 2015, 9:39pm

Thank you for sharing your code. (In the future, please use CODE tags to surround sample input, output, and code.)

Note that the normal way to specify the contents of an input line in an awk script is $0 . $n happens to work here because n hasn't been defined, and in awk references to undefined variables yield a 0 (when a number is needed) or an empty string (when a string is needed). When other readers may be reading your code, it is usually better to use $0 so they don't have to search your script to find out how n is defined.