erden
December 18, 2009, 3:27pm
1
Hello,
I have a number of tab delimited data files consists of two columns. Like that:
File1
800.000000 0.002744
799.000000 0.002517
798.000000 0.002836
797.000000 0.002553
FIle2
800.000000 0.000261
799.000000 0.000001
798.000000 0.000551
797.000000 0.000275
File3
800.000000 -0.060552
799.000000 -0.060650
798.000000 -0.060206
797.000000 -0.060329
...
FileN
I would like to display the data in one file:
File1 FIle2 File3 ... FileN
800.000000 0.002744 0.000261 -0.060552
799.000000 0.002517 0.000001 -0.060650
798.000000 0.002836 0.000551 -0.060206
797.000000 0.002553 0.000275 -0.060329
What is the best way to do that?
Thanks a lot!
Erden
PS - I tried `paste *` this is what I get:
800.0000800.0000000027440.0033342
799.0000799.0000000025170.0045670
798.0000798.0000000028360.0048366
797.0000797.0000000025530.0025539
Also I searched similar topics, but nothing worked out for me.
try the join command, man join
erden
December 18, 2009, 3:48pm
3
join only works for two files. I have a lot of files.
erden:
Hello,
I have a number of tab delimited data files consists of two columns. Like that:
File1
800.000000 0.002744
799.000000 0.002517
798.000000 0.002836
797.000000 0.002553
FIle2
800.000000 0.000261
799.000000 0.000001
798.000000 0.000551
797.000000 0.000275
File3
800.000000 -0.060552
799.000000 -0.060650
798.000000 -0.060206
797.000000 -0.060329
...
FileN
I would like to display the data in one file:
File1 FIle2 File3 ... FileN
800.000000 0.002744 0.000261 -0.060552
799.000000 0.002517 0.000001 -0.060650
798.000000 0.002836 0.000551 -0.060206
797.000000 0.002553 0.000275 -0.060329
What is the best way to do that?
Thanks a lot!
Erden
PS - I tried `paste *` this is what I get:
800.0000800.0000000027440.0033342
799.0000799.0000000025170.0045670
798.0000798.0000000028360.0048366
797.0000797.0000000025530.0025539
Also I searched similar topics, but nothing worked out for me.
Hello
paste -d" " File1 File2 File3
Regards
erden
December 18, 2009, 4:52pm
5
Didn't work. This is the output
paste -d" " File1 File2 File3
800.000000 0.0033342
799.000000 0.0045670
798.000000 0.0048366
797.000000 0.0025539
Somehow it takes first column of the first file and second column of the last file. Not the others Thanks anyway.
erden:
Didn't work. This is the output of `paste -d" " File1 File2 File3`:
800.000000 0.0033342
799.000000 0.0045670
798.000000 0.0048366
797.000000 0.0025539
Somehow it takes first column of the first file and second column of the last file. Not the others Thanks anyway.
But it works for me.
localhost:/home/gaurav# cat >t1
800.000000 0.002744
799.000000 0.002517
798.000000 0.002836
797.000000 0.002553
localhost:/home/gaurav# cat > t2
800.000000 0.000261
799.000000 0.000001
798.000000 0.000551
797.000000 0.000275
localhost:/home/gaurav# cat > t3
800.000000 -0.060552
799.000000 -0.060650
798.000000 -0.060206
797.000000 -0.060329
localhost:/home/gaurav# paste -d" " t1 t2 t3
800.000000 0.002744 800.000000 0.000261 800.000000 -0.060552
799.000000 0.002517 799.000000 0.000001 799.000000 -0.060650
798.000000 0.002836 798.000000 0.000551 798.000000 -0.060206
797.000000 0.002553 797.000000 0.000275 797.000000 -0.060329
erden
December 18, 2009, 5:42pm
7
When I copy and paste from here it worked. Forum changes tab chars with space.
But I did copy and paste data from my original long files and this time I get really weird results. Now it worked for just last column. It is crazy!
800.000000 -0.054032
799.000000 -0.053876
798.000000 -0.053918
797.000000 -0.055687 797.000000 -0.052766 797.000000 -0.054031
I attached my files, if anyone wants to check.
I am using ubuntu 9.10 and bash
Thanks.
paste -d '\t' t[0-3]
Please, start using code tags when posting code/data samples.
erden
December 18, 2009, 6:09pm
9
Thanks a lot. I tried and this is what I get, the result did not change. It doesn't make any sense.
erden@erden-linux:~/Data/try$ paste -d '\t' t[0-3]
800.0000800.000000.05562-0.054032
799.0000799.000000.05573-0.053876
798.0000798.000000.05547-0.053918
797.000000 -0.055687 797.000000 -0.052766 797.000000 -0.054031
Scott
December 18, 2009, 6:24pm
10
awk ' { I[$1] = I[$1]?I[$1] " " $2:$0 }
END {
for( i in I )
print I
}
' file*
$ ./T
800.000000 0.002744 0.000261 -0.060552
799.000000 0.002517 0.000001 -0.060650
798.000000 0.002836 0.000551 -0.060206
797.000000 0.002553 0.000275 -0.060329
erden
December 18, 2009, 6:58pm
11
Thanks a lot scottn. There is something wrong with my data files I guess. I did not work with my data. Here is the output:
./T
-0.053876 -0.055739
-0.053918 -0.055475
797.000000 -0.055687 -0.052766 -0.054031
-0.054032 -0.055621
I don't know what is wrong? I uploaded my files before, if anyone is willing to try.
Thanks a lot.
Scott
December 18, 2009, 7:08pm
12
Hi erden.
The problem is that you have ^M characters in your input files.
When they're removed it works.
Quick fix:
cat t* | tr -d '\015' | awk '
{
I[$1] = I[$1]?I[$1] " " $2:$0
}
END {
for( i in I)
print I
}
'
drl
December 18, 2009, 7:22pm
13
Hi.
Using the options for cat to see invisible characters for (your file t1) data1:
% cat -A data1
800.000000^I-0.055621^M$
799.000000^I-0.055739^M$
798.000000^I-0.055475^M$
797.000000^I-0.055687$
$
The ^M typically comes from DOS-style files, or using a DOS-style editor. There are a number of utilities to remove the ^M character; so afterwards:
% cat -A data1
800.000000^I-0.055621$
799.000000^I-0.055739$
798.000000^I-0.055475$
797.000000^I-0.055687$
$
Note that file ends with an empty line:
% wc -l data1
5 data1
I suspect that once you clean up your files, things will look better (pun intended).
Utilities to consider (a sample from a GNU/Linux system):
tofrodos - Converts DOS <-> Unix text files, alias tofromdos
flip - convert text file line endings between Unix and DOS formats
tr -d '\015' <your-filename >some-other-filename
Best wishes ... cheers, drl
erden
December 18, 2009, 7:31pm
14
Thanks scottn, thanks drl,
I got these files from a windows program. When I remove ^M chars it worked!
Thanks a lot, I really appreciate it.
All the best,
Erden
ichigo
December 18, 2009, 7:43pm
15
gawk 'BEGIN{
for(i=1;i<=ARGC;i++){
printf "%s\t", ARGV
}
}
FNR==1{print ""}
{
gsub(/\r\n/,"")
printf "%s\t%s " ,$1,$2
}' file*
erden
December 18, 2009, 9:05pm
16
Thanks ichigo,
There is a small problem with this script; it switches columns and rows. Here is the output:
800.000000 -0.055621 799.000000 -0.055739 798.000000 -0.055475 797.000000 -0.055687
800.000000 -0.052849 799.000000 -0.053105 798.000000 -0.052550 797.000000 -0.052766
800.000000 -0.054032 799.000000 -0.053876 798.000000 -0.053918 797.000000 -0.054031
ichigo
December 18, 2009, 10:13pm
17
gawk '
{
o=$1; $1=""
a[o]=a[o]"\t"$0
}END{ for (i in a) print i,a } ' file*
In this example the correct printing order is always maintained:
awk '{A[FNR]=(NR==FNR)?$0:A[FNR]"\t"$2} END{for(i=1;i<=FNR;i++) print A}' file*
if you have Python and no restriction using it.
import glob
import os
import operator
hash={}
d=0
for file in glob.glob("file*"):
for n,line in enumerate(open(file)):
line=line.rstrip()
sl=line.split()
if not sl[0] in hash: d+=1
hash.setdefault(sl[0],[d])
hash[sl[0]].append(sl[-1])
for i in sorted(hash.iteritems(), key=operator.itemgetter(1)):
print "%s\t%s" % ( i[0],'\t'.join(i[1][1:]))
output
$ ./python.py
800.000000 -0.060552 0.000261 0.002744
799.000000 -0.060650 0.000001 0.002517
798.000000 -0.060206 0.000551 0.002836
797.000000 -0.060329 0.000275 0.002553
erden
December 19, 2009, 8:42am
20
Thanks guys. I really appreciate that. awk works. I will try pyton when I have time.