Merge column file

aav1307 · January 12, 2015, 3:35pm

Hi All,

I have on file1 and file2 some,

$cat file1
aaa     
bbb     
ccc
ddd    
eee 
fff
     
ggg    
hhh    
iii
jjj

with line blank,

and

$cat file2
111
222

333
444

with line blank,

I need make merge beetwen col so,

$paste file1 file2 > file3

and exactly

$cat file3
aaa    111    
bbb    222    
ccc
ddd   
eee 
fff 
     
ggg    333   
hhh    444   
iii 
jjj

Thanks you,

blackrageous · January 12, 2015, 4:22pm

Confused. What exactly are you asking? It looks like you have resolved your own issue.

RudiC · January 12, 2015, 5:52pm

Try this (an ad hoc quick and dirty solution for exactly your samples, no error checking etc.):

awk     '/^$/           {do {ST=getline S0 < F1; print S0} while ((ST==1) && S0!=""); next}
         END            {do {ST=getline S0 < F1; if (ST!=1) exit; print S0} while (S0!="")}
                        {getline S0 < F1; print S0, $0}
        ' F1=file1 file2
aaa 111
bbb 222
ccc
ddd
eee
fff

ggg 333
hhh 444
iii
jjj

rdrtx1 · January 12, 2015, 6:11pm

try also:

awk '
FNR==1 {l=0}
! /./  {l=1; if (FNR!=NR) print; next}
FNR==NR { if (! l) { a[a1++]=$0; } else { b[b1++]=$0; } ; next }
{if (! l) {c=a[a2++]} else {c=b[b2++]}; print $0, c}
' file2 file1

Don_Cragun · January 12, 2015, 6:15pm

For something that can take any number of groups separated by blank lines in both files, your could try:

awk '
function print_rest() {
	while(c1[g2] > c2)
		printf("%s\t\n", d[g2, ++c2])
	if(++g2 <= g1)
		print ""
	c2 = 0
}
BEGIN {	g1 = g2 = 1
}
FNR == NR {
	if(!NF) {
		g1++
		next
	}
	d[g1, ++c1[g1]] = $1
	next
}
!NF {	print_rest()
	next
}
{	printf("%s\t%s\n", d[g2, ++c2], $1)
}
END {	print_rest()
}' file1 file2

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

If you want to keep the spaces that were present in some of your sample file1 and file2 input lines, change both occurrences of $1 in the above script with $0 . (Using $1 seemed to produce output closer to what you said you wanted.)

When run with your sample input files, the above produces the output:

aaa	111
bbb	222
ccc	
ddd	
eee	
fff	

ggg	333
hhh	444
iii	
jjj

and if you reverse the order of the input files, it produces:

111	aaa
222	bbb
	ccc
	ddd
	eee
	fff

333	ggg
444	hhh
	iii
	jjj

aav1307 · January 13, 2015, 10:22am

Very fine

Thanks you

---------- Post updated at 07:20 AM ---------- Previous update was at 05:49 AM ----------

Thaks so much Don Cragun,

Just like this, excelent, one finally question, how I can generality you code for many colums so,

$cat file1
aaa     
bbb     
ccc
ddd    
eee 
fff
     
ggg    
hhh    
iii
jjj

$cat file2
111  555  888
222  666  999

333  777 1010
444  777 1111

output

$cat file3

aaa 111 555 888    
bbb 222 666 999   
ccc 
ddd    
eee 
fff
     
ggg  333 777 1010 
hhh  444 777 1111 
iii
jjj

---------- Post updated at 07:22 AM ---------- Previous update was at 07:20 AM ----------

Thanks you,

Akshay_Hegde · January 13, 2015, 11:10am

Another way to do the same...

akshay@Aix:/tmp$ cat f1
aaa     
bbb     
ccc
ddd    
eee 
fff
     
ggg    
hhh    
iii
jjj

akshay@Aix:/tmp$ cat f2
111
222

333
444

akshay@Aix:/tmp$ cat test.sh
#!/bin/bash 

awk '
	FNR==NR{ 
		k    += !NF
		s[k]  = k in s ? s[k] SUBSEP $0 : $0 
		next
	}
	FNR==1{
		k = 0
	}
	{ 
		if(FNR==1 || !NF){
			if(z<n)
			{
				while(z != n){  printf("%s%s%s\n", "", OFS, t[z++]) }
			}

			n = split(s[k++],t,SUBSEP)	
			z = !n
		}
		z++
		printf("%s%s%s\n", $0, OFS, ( (NF && z in t) ?  t[z] : "" ) )
        }
    ' OFS='\t'  $1 $2

akshay@Aix:/tmp$ bash test.sh f1 f2
111	aaa     
222	bbb     
	ccc
	ddd    
	eee 
	fff
	
333	ggg    
444	hhh

akshay@Aix:/tmp$ bash test.sh f2 f1
aaa     111
bbb     222
ccc	
ddd    	
eee 	
fff	
     	
ggg    	333
hhh    	444
iii	
jjj

Don_Cragun · January 13, 2015, 10:09pm

Did you try changing $1 in both places in my script to $0 as I suggested in my post with that script if you wanted to preserve spaces in your input files?

If that didn't work, please show us the output my script produced after that change (in CODE tags) and clearly specify what needs to be done differently.

MadeInGermany · January 15, 2015, 11:29am

Here is a general merge

#!/bin/sh
while     
 read line2 <&4
 e2=$?
 read line1 <&3 || [ $e2 -eq 0 ]
do
 if [ -z "$line1" ]
 then
  while read line2 <&4 && [ -n "$line2" ]
  do
   printf "%s\n" "$line2"
  done 
 elif [ -z "$line2" ]
 then
  while read line1 <&3 && [ -n "$line1" ]
  do
   printf "%s\n" "$line1"
  done 
 fi
 printf "%s\n" "$line1 $line2"
done 3<file1 4<file2

RudiC · January 15, 2015, 12:56pm

Nice!

But - two lines are missing, the ones that would be first to be printed alone.

And, if you reverse the order of the files, you get

111 aaa
222 bbb
ddd
eee
fff
 
333 ggg
444 hhh
jjj

If you printed $line2 prefixed by a <TAB>, like printf "\t%s\n" "$line2" and printf "%s\t%s\n" "$line1" "$line2" , result were

111    aaa
222    bbb
       ddd
       eee
       fff
    
333    ggg
444    hhh
       jjj

Try this corrected version of MadeInGermany's proposal:

while   read line2 <&4
        e2=$?
        read line1 <&3 || [ $e2 -eq 0 ]
   do   if [ -z "$line1" ]
          then  printf "\t%s\n" "$line2" 
                while read line2 <&4 && [ -n "$line2" ]
                   do printf "\t%s\n" "$line2"   
                   done
        elif [ -z "$line2" ]  
          then  printf "%s\n" "$line1"
                while read line1 <&3 && [ -n "$line1" ]
                   do   printf "%s\n" "$line1"
                   done
        fi
        printf "%s\t%s\n" "$line1" "$line2"
   done 3<file1 4<file2
aaa    111
bbb    222
ccc
ddd
eee
fff
    
ggg    333
hhh    444
iii
jjj

Still open: correction for done 3<file1 4<file1

MadeInGermany · January 15, 2015, 1:51pm

Indeed the output was not exact, and one line was even left out.
Here is another fix:

while     
 read line2 <&4
 e2=$?
 read line1 <&3 || [ $e2 -eq 0 ]
do
 if [ -z "$line1" ] && [ -n "$line2" ]
 then
  while
   printf "\t%s\n" "$line2"
   read line2 <&4 && [ -n "$line2" ]
  do :
  done 
 elif [ -z "$line2" ] && [ -n "$line1" ]
 then
  while
   printf "%s\n" "$line1"
   read line1 <&3 && [ -n "$line1" ]
  do :
  done
 fi
 printf "%s\t%s\n" "$line1" "$line2"
done 3<file1 4<file2

Here the printf is moved to the while, again using the "list" feature of while (takes the exit status of the last command).
An empty loop causes an error, therefore the : .
In effect it emulates a repeat-until (or do-until) loop.

RudiC · January 15, 2015, 3:02pm

appreciate the repeat - until !

disedorgue · January 15, 2015, 7:59pm

Hello,
For and only for fun, another way in bash and with paste:

unset B C
F() { [ ${#2} -eq 1 ] && ((X++)) || B[$X]=${B[$X]}$2 ; }; X=1 ; mapfile -c 1 -C 'F' <file2
F() { [ ${#2} -eq 1 ] && ((X++)) || C[$X]=${C[$X]}$2 ; }; X=1 ; mapfile -c 1 -C 'F' <file1
[ ${#B[@]} -le ${#C[@]} ] && Y=${#C[@]} || Y=${#B[@]}
X=1;while [ $X -le $Y ] ; do paste <(echo -n "${B[$X]}") <(echo -n "${C[$X]}") ; ((X++)) ; echo ;done

Beware, this method load files in memory.
Regards.