Hello everyone,
I searched the forum looking for answers to this but I could not pinpoint exactly what I need as I keep having trouble.
I have many files each having two columns and hundreds of rows.
first column is a string (can have many words) and the second column is a number.The files are tab separated.
Example of some rows of three files:
file1.txt
Column1 Column2
S1 31.37
S1 31.49
S1 31.74
S10 31.13
S10 31.10
S10 31.29
S11 29.49
S11 29.54
S11 29.25
S12 31.24
S12 31.05
S12 31.50
S13 32.48
S13 31.07
...
file2.txt
column 1 Column 2
1 CVS IN 23.14
1 CVS IN 23.24
1 CVS IN 23.24
1 CVS IP 21.53
1 CVS IP 21.40
1 CVS IP 21.36
1 WB IN 22.89
1 WB IN 23.02
1 WB IN 23.23
1 WB IP 26.70
1 WB IP 26.50
1 WB IP 26.67
2 CVS IN 23.34
2 CVS IN 23.34
2 CVS IN 23.14
2 CVS IP 21.38
2 CVS IP 21.49
2 CVS IP 21.54
file3.txt
Column 1 Column2
1 WB IN 23.45
1 WB IN 23.53
1 WB IP 24.55
1 CVS IN 23.62
1 CVS IN 23.46
I CVS IP 21.74
1 WB IN 23.33
1 WB IN 23.42
1 WB IP 26.24
1 CVS IN 23.71
1 CVS IN 23.44
I CVS IP 22.35
1 WB IN 24.75
1 WB IN 24.88
1 WB IP 25.50
I want to to combine the files into one.
I use the following code for the time being:
paste $folder/*.txt | column -s $'\t' -t > folder/output_biorad.gff
The problem is that the first file in the file is smaller than the other ones and the output, when the rows of the first file end gets misaligned.
The columns of the second file shift to the left, and all the others follow etc.
I have many files and this is getting very confusing for the users, who are biologists and do not want to manipulate datasets. When there is a shorter left column the right columns become missaligned.
This is an example output that shows the problem: the problem starts at the line that i show as <----problem. the column I CVS IP should be the third column but it shifts to second.
.......
8 WB IP 25.53 8 WB IN 22.98 7 WB IN 25.7 3 CVS IN
8 CVS IN 23.44 8 WB IP 28.75 7 WB IP 26.59 3 CVS IP
8 CVS IP 22.25 8 WB IP 28.83 7 WB IP 26.16 3 CVS IP
8 CVS IP 22.37 8 WB IP 29.05 7 WB IP 26.28 3 CVS IP
8 WB IN 24.47 I CVS IP 8 CVS IN 25.7 3 WB IN
8 WB IP 26.05 I CVS IP 8 CVS IN 25.31 3 WB IN
8 WB IP 26 I CVS IP 8 CVS IN 26.11 3 WB IN
8 CVS IN 25.11 I CVS IP 8 CVS IP 25.95 3 WB IN
8 CVS IP 23.19 I CVS IP 8 CVS IP 25.48 3 WB IN
8 CVS IP 23.19 I CVS IP 8 WB IN 25.83 3 WB IN
8 WB IN 24.18 I CVS IP 8 WB IN 26.1 3 WB IP
8 WB IP 33.73 I CVS IP 8 WB IN 25.81 3 WB IP
8 WB IP 33.75 I CVS IP 8 WB IP 26.74 3 WB IP
8 CVS IN 24.24 I CVS IP 8 WB IP 26.73 3 CVS IN
8 CVS IP 23.22 I CVS IP 8 WB IP 26.75 3 CVS IN
8 CVS IP 23.83 I CVS IP I CVS IP 3 CVS IN
I CVS IP I CVS IP 3 CVS IN <--problem
I CVS IP I CVS IP 3 CVS IN
I CVS IP I CVS IP 3 CVS IN
I CVS IP I CVS IP 3 CVS IP
I CVS IP I CVS IP 3 CVS IP
I CVS IP I CVS IP 3 CVS IP
I CVS IP I CVS IP 3 WB IN
I CVS IP I CVS IP 3 WB IN
I CVS IP I CVS IP 3 WB IN
I CVS IP I CVS IP 3 WB IN
I CVS IP I CVS IP 3 WB IN
I CVS IP I CVS IP 3 WB IN
I CVS IP I CVS IP 3 WB IP
I CVS IP I CVS IP 3 WB IP
I CVS IP I CVS IP 3 WB IP
I CVS IP I CVS IP 3 CVS IN
I CVS IP I CVS IP 3 CVS IN
I CVS IP I CVS IP 3 CVS IN
I CVS IP I CVS IP 3 CVS IN
I CVS IP I CVS IP 3 CVS IN
I CVS IP I CVS IP 3 CVS IN
I CVS IP I CVS IP 3 CVS IP
I CVS IP I CVS IP 3 CVS IP
I CVS IP I CVS IP 3 CVS IP
4V CVS IN I CVS IP 3 WB IN
4V CVS IN I CVS IP 3 WB IN
4V CVS IN I CVS IP 3 WB IN
4V CVS IN I CVS IP 3 WB IP
When the lines from the firs file end all utput is shifted to the left. How can I keep them aligned?
i looked everywhere and I have not been able to find out how to solve it.
I am not entirely familiar with awk, i use perl for my other scripting and i am a beginner in shell scripting.
Many thanks for your help in advance