Merging lines in a file

mohan_tuty · July 14, 2008, 7:36am

Hi,

I want to merge the lines starting with a comma symbol with the previous line of the file.

Input :

cat file.txt

name1,name2
,name3,name4
emp1,emp2,emp3
,emp4
,emp5
user1,user2
,user3

Output

name1,name2,name3,name4
emp1,emp2,emp3,emp4,emp5
user1,user2,user3

IS there any command in unix to do this conversion for the entire file?

Mohan

radoulov · July 14, 2008, 9:22am

awk 'END { print r }
r && !/^,/ { print r; r = "" }
{ r = r ? r $0 : $0 }
' file

Use nawk or /usr/xpg4/bin/awk on Solaris.

mohan_tuty · July 14, 2008, 11:36pm

Thanks for the reply. This script works fine.

Is there any way to merge two lines based on specific occurance of a character

 I am having a flat file which contains multiple records.

Each row in the file should contain specified number of delimiter.
For a particular row , if the delimiter count is not matched with the specified count, then then next row should be merged with the previous row. Again the same check has to be done.

The script should accept number of occurances of a particular delimiter as the parameter.

For Example if the number of occurances of comma in every line is 5 in a flat file

Sample Input

   1,2,3,
   4,5,6
   a,b,c,d,e,f
   10,20,30
   ,40,50,60
   11,22
   ,33
    ,44,
    55,
    66

Output (Each row should contain 5 commas)

     1,2,3,4,5,6
     a,b,c,d,e,f
     10,20,30,40,50,60
     11,22,33,44,55,66

Mohan

radoulov · July 15, 2008, 4:41am

Set the desired number of columns on the command line (cols= ...),
use nawk or /usr/xpg4/bin/awk on Solaris:
(I feel I'm reinventing the wheel ...)

awk -F, '{ 
  for (i=1; i<=NF; i++)
    if ($i) 
      printf $i (++c % cols ? FS : RS)
    }
END { 
  if (c % cols) 
    print 
	}' cols=5 input

chella · July 15, 2008, 5:33am

Hi,

Try this for the above scenario

paste -sd# inp.txt | sed 's/#,/,/g' | tr '#' '\n'

Regards,
Chella

chella · July 15, 2008, 5:39am

mohan_tuty:

Thanks for the reply. This script works fine.

Is there any way to merge two lines based on specific occurance of a character

I am having a flat file which contains multiple records.
Each row in the file should contain specified number of delimiter.
For a particular row , if the delimiter count is not matched with the specified count, then then next row should be merged with the previous row. Again the same check has to be done.

The script should accept number of occurances of a particular delimiter as the parameter.

For Example if the number of occurances of comma in every line is 5 in a flat file

Sample Input

1,2,3,
4,5,6
a,b,c,d,e,f
10,20,30
,40,50,60
11,22
,33
,44,
55,
66

Output (Each row should contain 5 commas)

1,2,3,4,5,6
a,b,c,d,e,f
10,20,30,40,50,60
11,22,33,44,55,66

Mohan

Hi,

Try this for the above scenario,

paste -sd# inp.txt | sed 's/#,/,/g;s/,#/,/g' | tr '#' '\n'

Regards,
Chella

mohan_tuty · July 16, 2008, 1:12am

Thanks for the reply.

This works fine if the word is not splitted in two lines.

But it a word is splitted in two lines it is not working.

Sample input

one,two,three,
four,five
six,se
ven,eight,nine,ten
eleven,twelve,thirteen,fourteen,fif
teen
1,2,3,4,5

output (Contains 4 commas in each line)

one,two,three,four,five
six,seven,eight,nine,ten
eleven,twelve,thirteen,fourteen,fifteen
1,2,3,4,5

radoulov · July 16, 2008, 3:26am

Well,
for me there's not way to distinguish between:

ten
eleven

and:

fif
teen

It should be possible with some kind of dictionary lookup ...

But ..., is this a real professional or personal need or it's a just another homework?
The requirement is always changing: first it was about characters and digits, now about English words ...

ghostdog74 · July 16, 2008, 4:23am

i would think the easier way is to arrange them into one line and do the necessary

tr -s "\n" ',' < file | ...........

summer_cherry · July 16, 2008, 5:28am

awk 'BEGIN{FS=","}
{
if (t=="")
{
	t=$0
	next
}
if ($1=="")
	t=sprintf("%s%s",t,$0)
else
{
	print t
	t=$0
}
}
END{
print t
}' file