Merging columns from multiple files

erden · December 18, 2009, 3:27pm

Hello,

I have a number of tab delimited data files consists of two columns. Like that:

File1
800.000000	0.002744
799.000000	0.002517
798.000000	0.002836
797.000000	0.002553

FIle2
800.000000	0.000261
799.000000	0.000001
798.000000	0.000551
797.000000	0.000275

File3
800.000000	-0.060552
799.000000	-0.060650
798.000000	-0.060206
797.000000	-0.060329

...

FileN

I would like to display the data in one file:

                         File1		FIle2		File3     ... FileN
800.000000	0.002744	0.000261	-0.060552
799.000000	0.002517	0.000001	-0.060650
798.000000	0.002836	0.000551	-0.060206
797.000000	0.002553	0.000275	-0.060329

What is the best way to do that?

Thanks a lot!
Erden

PS - I tried `paste *` this is what I get:

800.0000800.0000000027440.0033342
799.0000799.0000000025170.0045670
798.0000798.0000000028360.0048366
797.0000797.0000000025530.0025539

Also I searched similar topics, but nothing worked out for me.

njysprez · December 18, 2009, 3:38pm

try the join command, man join

erden · December 18, 2009, 3:48pm

join only works for two files. I have a lot of files.

gaurav1086 · December 18, 2009, 4:43pm

erden:

Hello,

I have a number of tab delimited data files consists of two columns. Like that:

File1
800.000000 0.002744
799.000000 0.002517
798.000000 0.002836
797.000000 0.002553

FIle2
800.000000 0.000261
799.000000 0.000001
798.000000 0.000551
797.000000 0.000275

File3
800.000000 -0.060552
799.000000 -0.060650
798.000000 -0.060206
797.000000 -0.060329

...

FileN

I would like to display the data in one file:

File1 FIle2 File3 ... FileN
800.000000 0.002744 0.000261 -0.060552
799.000000 0.002517 0.000001 -0.060650
798.000000 0.002836 0.000551 -0.060206
797.000000 0.002553 0.000275 -0.060329

What is the best way to do that?

Thanks a lot!
Erden

PS - I tried `paste *` this is what I get:

800.0000800.0000000027440.0033342
799.0000799.0000000025170.0045670
798.0000798.0000000028360.0048366
797.0000797.0000000025530.0025539

Also I searched similar topics, but nothing worked out for me.

Hello

paste -d" " File1 File2 File3

Regards

erden · December 18, 2009, 4:52pm

Didn't work. This is the output

paste -d" " File1 File2 File3

 800.000000     0.0033342
 799.000000     0.0045670
 798.000000     0.0048366
 797.000000     0.0025539

Somehow it takes first column of the first file and second column of the last file. Not the others Thanks anyway.

gaurav1086 · December 18, 2009, 4:54pm

But it works for me.

localhost:/home/gaurav# cat >t1
800.000000 0.002744
799.000000 0.002517
798.000000 0.002836
797.000000 0.002553
localhost:/home/gaurav# cat > t2
800.000000 0.000261
799.000000 0.000001
798.000000 0.000551
797.000000 0.000275
localhost:/home/gaurav# cat > t3
800.000000 -0.060552
799.000000 -0.060650
798.000000 -0.060206
797.000000 -0.060329
localhost:/home/gaurav# paste -d" " t1 t2 t3
800.000000 0.002744 800.000000 0.000261 800.000000 -0.060552
799.000000 0.002517 799.000000 0.000001 799.000000 -0.060650
798.000000 0.002836 798.000000 0.000551 798.000000 -0.060206
797.000000 0.002553 797.000000 0.000275 797.000000 -0.060329

erden · December 18, 2009, 5:42pm

When I copy and paste from here it worked. Forum changes tab chars with space.

But I did copy and paste data from my original long files and this time I get really weird results. Now it worked for just last column. It is crazy!

 800.000000     -0.054032
 799.000000     -0.053876
 798.000000     -0.053918
797.000000	-0.055687 797.000000	-0.052766 797.000000	-0.054031

I attached my files, if anyone wants to check.

I am using ubuntu 9.10 and bash

Thanks.

vgersh99 · December 18, 2009, 5:59pm

paste -d '\t' t[0-3]

Please, start using code tags when posting code/data samples.

erden · December 18, 2009, 6:09pm

Thanks a lot. I tried and this is what I get, the result did not change. It doesn't make any sense.

erden@erden-linux:~/Data/try$ paste -d '\t' t[0-3]
800.0000800.000000.05562-0.054032
799.0000799.000000.05573-0.053876
798.0000798.000000.05547-0.053918
797.000000	-0.055687	797.000000	-0.052766	797.000000	-0.054031

Scott · December 18, 2009, 6:24pm

awk ' { I[$1] = I[$1]?I[$1] " " $2:$0 }
 END {
   for( i in I )
     print I
 }
' file*

$ ./T   
800.000000	0.002744 0.000261 -0.060552
799.000000	0.002517 0.000001 -0.060650
798.000000	0.002836 0.000551 -0.060206
797.000000	0.002553 0.000275 -0.060329

erden · December 18, 2009, 6:58pm

scottn:

awk ' { I[$1] = I[$1]?I[$1] " " $2:$0 }
 END {
   for( i in I )
   print I
 }
' file*

$ ./T   
800.000000	0.002744 0.000261 -0.060552
799.000000	0.002517 0.000001 -0.060650
798.000000	0.002836 0.000551 -0.060206
797.000000	0.002553 0.000275 -0.060329

Thanks a lot scottn. There is something wrong with my data files I guess. I did not work with my data. Here is the output:

./T

 -0.053876	-0.055739
 -0.053918	-0.055475
797.000000	-0.055687 -0.052766 -0.054031
 -0.054032	-0.055621

I don't know what is wrong? I uploaded my files before, if anyone is willing to try.

Thanks a lot.

Scott · December 18, 2009, 7:08pm

Hi erden.

The problem is that you have ^M characters in your input files.

When they're removed it works.

Quick fix:

cat t* | tr -d '\015' | awk '
 {
   I[$1] = I[$1]?I[$1] " " $2:$0
 }

 END {
   for( i in I)
     print I
 }
'

drl · December 18, 2009, 7:22pm

Hi.

Using the options for cat to see invisible characters for (your file t1) data1:

% cat -A data1
800.000000^I-0.055621^M$
799.000000^I-0.055739^M$
798.000000^I-0.055475^M$
797.000000^I-0.055687$
$

The ^M typically comes from DOS-style files, or using a DOS-style editor. There are a number of utilities to remove the ^M character; so afterwards:

% cat -A data1
800.000000^I-0.055621$
799.000000^I-0.055739$
798.000000^I-0.055475$
797.000000^I-0.055687$
$

Note that file ends with an empty line:

% wc -l data1
5 data1

I suspect that once you clean up your files, things will look better (pun intended).

Utilities to consider (a sample from a GNU/Linux system):

tofrodos - Converts DOS <-> Unix text files, alias tofromdos
flip - convert text file line endings between Unix and DOS formats
tr -d '\015' <your-filename >some-other-filename

Best wishes ... cheers, drl

erden · December 18, 2009, 7:31pm

Thanks scottn, thanks drl,

I got these files from a windows program. When I remove ^M chars it worked!

Thanks a lot, I really appreciate it.
All the best,
Erden

ichigo · December 18, 2009, 7:43pm

gawk 'BEGIN{
     for(i=1;i<=ARGC;i++){
        printf "%s\t", ARGV
     }
}
FNR==1{print ""}
{
    gsub(/\r\n/,"")
    printf "%s\t%s " ,$1,$2
}' file*

erden · December 18, 2009, 9:05pm

Thanks ichigo,
There is a small problem with this script; it switches columns and rows. Here is the output:

800.000000	-0.055621 799.000000	-0.055739 798.000000	-0.055475 797.000000	-0.055687 	 
800.000000	-0.052849 799.000000	-0.053105 798.000000	-0.052550 797.000000	-0.052766 	 
800.000000	-0.054032 799.000000	-0.053876 798.000000	-0.053918 797.000000	-0.054031

ichigo · December 18, 2009, 10:13pm

gawk '
{
    o=$1; $1=""
    a[o]=a[o]"\t"$0
}END{ for (i in a) print i,a } ' file*

Scrutinizer · December 19, 2009, 3:54am

In this example the correct printing order is always maintained:

awk '{A[FNR]=(NR==FNR)?$0:A[FNR]"\t"$2} END{for(i=1;i<=FNR;i++) print A}' file*

gh0std0g74 · December 19, 2009, 6:46am

if you have Python and no restriction using it.

import glob
import os
import operator
hash={}
d=0
for file in glob.glob("file*"):
    for n,line in enumerate(open(file)):
        line=line.rstrip()
        sl=line.split()
        if not sl[0] in hash: d+=1
        hash.setdefault(sl[0],[d])
        hash[sl[0]].append(sl[-1])

for i in sorted(hash.iteritems(), key=operator.itemgetter(1)):
    print "%s\t%s" % ( i[0],'\t'.join(i[1][1:]))

output

$ ./python.py
800.000000      -0.060552       0.000261        0.002744
799.000000      -0.060650       0.000001        0.002517
798.000000      -0.060206       0.000551        0.002836
797.000000      -0.060329       0.000275        0.002553

erden · December 19, 2009, 8:42am

Thanks guys. I really appreciate that. awk works. I will try pyton when I have time.