Merging columns from multiple files

Hello,

I have a number of tab delimited data files consists of two columns. Like that:

File1
800.000000	0.002744
799.000000	0.002517
798.000000	0.002836
797.000000	0.002553

FIle2
800.000000	0.000261
799.000000	0.000001
798.000000	0.000551
797.000000	0.000275

File3
800.000000	-0.060552
799.000000	-0.060650
798.000000	-0.060206
797.000000	-0.060329

...

FileN

I would like to display the data in one file:

                         File1		FIle2		File3     ... FileN
800.000000	0.002744	0.000261	-0.060552
799.000000	0.002517	0.000001	-0.060650
798.000000	0.002836	0.000551	-0.060206
797.000000	0.002553	0.000275	-0.060329

What is the best way to do that?

Thanks a lot!
Erden

PS - I tried `paste *` this is what I get:

800.0000800.0000000027440.0033342
799.0000799.0000000025170.0045670
798.0000798.0000000028360.0048366
797.0000797.0000000025530.0025539

Also I searched similar topics, but nothing worked out for me.

try the join command, man join

join only works for two files. I have a lot of files.

Hello

paste -d" " File1 File2 File3

Regards

Didn't work. This is the output

paste -d" " File1 File2 File3

 800.000000     0.0033342
 799.000000     0.0045670
 798.000000     0.0048366
 797.000000     0.0025539

Somehow it takes first column of the first file and second column of the last file. Not the others Thanks anyway.

But it works for me.

localhost:/home/gaurav# cat >t1
800.000000 0.002744
799.000000 0.002517
798.000000 0.002836
797.000000 0.002553
localhost:/home/gaurav# cat > t2
800.000000 0.000261
799.000000 0.000001
798.000000 0.000551
797.000000 0.000275
localhost:/home/gaurav# cat > t3
800.000000 -0.060552
799.000000 -0.060650
798.000000 -0.060206
797.000000 -0.060329
localhost:/home/gaurav# paste -d" " t1 t2 t3
800.000000 0.002744 800.000000 0.000261 800.000000 -0.060552
799.000000 0.002517 799.000000 0.000001 799.000000 -0.060650
798.000000 0.002836 798.000000 0.000551 798.000000 -0.060206
797.000000 0.002553 797.000000 0.000275 797.000000 -0.060329

When I copy and paste from here it worked. Forum changes tab chars with space.

But I did copy and paste data from my original long files and this time I get really weird results. Now it worked for just last column. It is crazy!

 800.000000     -0.054032
 799.000000     -0.053876
 798.000000     -0.053918
797.000000	-0.055687 797.000000	-0.052766 797.000000	-0.054031

I attached my files, if anyone wants to check.

I am using ubuntu 9.10 and bash

Thanks.

paste -d '\t' t[0-3]

Please, start using code tags when posting code/data samples.

Thanks a lot. I tried and this is what I get, the result did not change. It doesn't make any sense.

erden@erden-linux:~/Data/try$ paste -d '\t' t[0-3]
800.0000800.000000.05562-0.054032
799.0000799.000000.05573-0.053876
798.0000798.000000.05547-0.053918
797.000000	-0.055687	797.000000	-0.052766	797.000000	-0.054031
awk ' { I[$1] = I[$1]?I[$1] " " $2:$0 }
 END {
   for( i in I )
     print I
 }
' file*

$ ./T   
800.000000	0.002744 0.000261 -0.060552
799.000000	0.002517 0.000001 -0.060650
798.000000	0.002836 0.000551 -0.060206
797.000000	0.002553 0.000275 -0.060329

Thanks a lot scottn. There is something wrong with my data files I guess. I did not work with my data. Here is the output:

./T

 -0.053876	-0.055739
 -0.053918	-0.055475
797.000000	-0.055687 -0.052766 -0.054031
 -0.054032	-0.055621

I don't know what is wrong? I uploaded my files before, if anyone is willing to try.

Thanks a lot.

Hi erden.

The problem is that you have ^M characters in your input files.

When they're removed it works.

Quick fix:

cat t* | tr -d '\015' | awk '
 {
   I[$1] = I[$1]?I[$1] " " $2:$0
 }

 END {
   for( i in I)
     print I
 }
'

Hi.

Using the options for cat to see invisible characters for (your file t1) data1:

% cat -A data1
800.000000^I-0.055621^M$
799.000000^I-0.055739^M$
798.000000^I-0.055475^M$
797.000000^I-0.055687$
$

The ^M typically comes from DOS-style files, or using a DOS-style editor. There are a number of utilities to remove the ^M character; so afterwards:

% cat -A data1
800.000000^I-0.055621$
799.000000^I-0.055739$
798.000000^I-0.055475$
797.000000^I-0.055687$
$

Note that file ends with an empty line:

% wc -l data1
5 data1

I suspect that once you clean up your files, things will look better (pun intended).

Utilities to consider (a sample from a GNU/Linux system):

tofrodos - Converts DOS <-> Unix text files, alias tofromdos
flip - convert text file line endings between Unix and DOS formats
tr -d '\015' <your-filename >some-other-filename

Best wishes ... cheers, drl

Thanks scottn, thanks drl,

I got these files from a windows program. When I remove ^M chars it worked!

Thanks a lot, I really appreciate it.
All the best,
Erden

gawk 'BEGIN{
     for(i=1;i<=ARGC;i++){
        printf "%s\t", ARGV
     }
}
FNR==1{print ""}
{
    gsub(/\r\n/,"")
    printf "%s\t%s " ,$1,$2
}' file*


Thanks ichigo,
There is a small problem with this script; it switches columns and rows. Here is the output:

800.000000	-0.055621 799.000000	-0.055739 798.000000	-0.055475 797.000000	-0.055687 	 
800.000000	-0.052849 799.000000	-0.053105 798.000000	-0.052550 797.000000	-0.052766 	 
800.000000	-0.054032 799.000000	-0.053876 798.000000	-0.053918 797.000000	-0.054031
gawk '
{
    o=$1; $1=""
    a[o]=a[o]"\t"$0
}END{ for (i in a) print i,a } ' file*

In this example the correct printing order is always maintained:

awk '{A[FNR]=(NR==FNR)?$0:A[FNR]"\t"$2} END{for(i=1;i<=FNR;i++) print A}' file*

if you have Python and no restriction using it.

import glob
import os
import operator
hash={}
d=0
for file in glob.glob("file*"):
    for n,line in enumerate(open(file)):
        line=line.rstrip()
        sl=line.split()
        if not sl[0] in hash: d+=1
        hash.setdefault(sl[0],[d])
        hash[sl[0]].append(sl[-1])

for i in sorted(hash.iteritems(), key=operator.itemgetter(1)):
    print "%s\t%s" % ( i[0],'\t'.join(i[1][1:]))

output

$ ./python.py
800.000000      -0.060552       0.000261        0.002744
799.000000      -0.060650       0.000001        0.002517
798.000000      -0.060206       0.000551        0.002836
797.000000      -0.060329       0.000275        0.002553


Thanks guys. I really appreciate that. awk works. I will try pyton when I have time.