Input file 1
S1 S2 S3
comp95_c1 1.00 comp95_c1 1.00 3.00
comp4_c0 6.00 comp4_c0 6.00 6.00
comp3_c0 0.00 comp3_c0 0.00 4.00
comp15_c1 3.00 comp15_c1 3.00 3.00
comp28_c0 33.00 comp28_c0 33.00 2.00
comp23_c0 4.00 comp23_c0 4.00 3.00
Desired output file 1
S1 S2 S3
comp95_c1 1.00 1.00 3.00
comp4_c0 6.00 6.00 6.00
comp3_c0 0.00 0.00 4.00
comp15_c1 3.00 3.00 3.00
comp28_c0 33.00 33.00 2.00
comp23_c0 4.00 4.00 3.00
Input file 2
S1 S2 S3
comp5_c1 1.00 1.00 comp5_c1 3.00
comp40_c0 6.00 6.00 comp40_c0 6.00
comp31_c0 0.00 0.00 comp31_c0 4.00
comp51_c1 3.00 3.00 comp51_c1 3.00
comp82_c0 33.00 33.00 comp82_c0 2.00
comp3_c0 4.00 4.00 comp3_c0 3.00
Desired output file 2
S1 S2 S3
comp5_c1 1.00 1.00 3.00
comp40_c0 6.00 6.00 6.00
comp31_c0 0.00 0.00 4.00
comp51_c1 3.00 3.00 3.00
comp82_c0 33.00 33.00 2.00
comp3_c0 4.00 4.00 3.00
I hope can remove the column (compXXX) that appear twice.
All the files are tab delimited.
Thanks for any advice.
pamu
September 5, 2013, 12:23am
2
Try
Assuming you want to compare with column 1 only.
awk '{S=$1;for(i=2;i<=NF;i++){if($i != $1){S=S OFS $i}}print S;}' OFS="\t" file
Hi pamu,
I did try your awk command for Input file 1.
It return the following result:
S1 S3
comp95_c1 1.00 1.00 3.00
comp4_c0 6.00 6.00 6.00
comp3_c0 0.00 0.00 4.00
comp15_c1 3.00 3.00 3.00
comp28_c0 33.00 33.00 2.00
comp23_c0 4.00 4.00 3.00
It seems like slightly different with desired output.
The line above "compXXXX" is a "\t" delimited and the content below "S1", "S2", "S3" are number etc.
Sorry for troubling you again.
pamu
September 5, 2013, 1:21am
4
Is this what you want..?
awk '{T=NR==1?"\t":"";S=T $1;for(i=2;i<=NF;i++){if($i != $1){S=S OFS $i}}print S;}' OFS="\t" file
S1 S2 S3
comp95_c1 1.00 1.00 3.00
comp4_c0 6.00 6.00 6.00
comp3_c0 0.00 0.00 4.00
comp15_c1 3.00 3.00 3.00
comp28_c0 33.00 33.00 2.00
comp23_c0 4.00 4.00 3.00
Hi pamu,
It is almost there
But I just curious if my S1, S2, S3 is becomes like S1, S1, S3
Is it possible that you make it still print out the following result
S1 S1 S3
comp95_c1 1.00 1.00 3.00
comp4_c0 6.00 6.00 6.00
comp3_c0 0.00 0.00 4.00
comp15_c1 3.00 3.00 3.00
comp28_c0 33.00 33.00 2.00
comp23_c0 4.00 4.00 3.00
Sorry again.
I just notice some case work fine but some case won't work perfect if the S1,S2,S3 is becomes like S1,S1,S3
pamu
September 5, 2013, 2:20am
6
What abt this..?
awk 'NR==1{$1=OFS OFS $1}1 NR>1{S=$1;for(i=2;i<=NF;i++){if($i != $1){S=S OFS $i}}print S;}' OFS="\t" file
S1 S1 S3
comp95_c1 1.00 1.00 3.00
comp4_c0 6.00 6.00 6.00
comp3_c0 0.00 0.00 4.00
comp15_c1 3.00 3.00 3.00
comp28_c0 33.00 33.00 2.00
comp23_c0 4.00 4.00 3.00
Hi pamu,
When I try to issue the following command:
awk 'NR==1{$1=OFS OFS $1}1 NR>1{S=$1;for(i=2;i<=NF;i++){if($i != $1){S=S OFS $i}}print S;}' OFS="\t" file > file.out
awk -F"\t" '{print $1"\t"}' file.out
comp95_c1
comp4_c0
comp3_c0
comp15_c1
comp28_c0
comp23_c0
awk -F"\t" '{print $2"\t"}' file.out
1.00
6.00
0.00
3.00
33.00
4.00
awk -F"\t" '{print $3"\t"}' file.out
S1
1.00
6.00
0.00
3.00
33.00
4.00
awk -F"\t" '{print $4"\t"}' file.out
S1
3.00
6.00
4.00
3.00
2.00
3.00
awk -F"\t" '{print $5"\t"}' file.out
S3
I will expect the following result:
awk -F"\t" '{print $1"\t"}' file.out
comp95_c1
comp4_c0
comp3_c0
comp15_c1
comp28_c0
comp23_c0
awk -F"\t" '{print $2"\t"}' file.out
S1
1.00
6.00
0.00
3.00
33.00
4.00
awk -F"\t" '{print $3"\t"}' file.out
S1
1.00
6.00
0.00
3.00
33.00
4.00
awk -F"\t" '{print $4"\t"}' file.out
S3
3.00
6.00
4.00
3.00
2.00
3.00
awk -F"\t" '{print $5"\t"}' file.out
Thanks for your advice regarding the arrangement of "S1, S1, S3" and their corresponding record for further analysis.
pamu
September 5, 2013, 2:41am
8
Then remove one OFS
from awk
code
awk 'NR==1{$1=OFS $1}1 NR>1{S=$1;for(i=2;i<=NF;i++){if($i != $1){S=S OFS $i}}print S;}' OFS="\t" file > file.out
1 Like
Perfect, pamu.
Really thanks and appreciate your talent.
Thumb up