Help with joining files and adding headers to files

Hi,

I have about 20 tab delimited text files that have non sequential numbering such as:

UCD2.summary.txt
UCD45.summary.txt
UCD56.summery.txt

The first column of each file has the same number of lines and content. The next 2 column have data points:

i.e UCD2.summary.txt:

a   8.9   9.6
b   5.6   68
c   8.5   52

UCD45.summary.txt:

a   4.2   8.5
b   5.5   56
c   5.6   12

There are no headers for these files. I would like to join all these files together since the first column has the same data. However I need to be able to tell which file each value came from so I need to add headers.

The output file would look like this with header files:

probeID   UCD2-value1   UCD2-value2  UCD45-value1 UCD45-value2
a                  8.9               9.6                4.2               8.5
b                  5.6                68                5.5               56
c                  8.5                52                5.6               12

I am very new to linux and perl and would love some help accomplishing the output above. Thanks!

Ryan

$ echo header > filename
$ join file1 file2 >> filename
$ cat filename

header
a 8.9 9.6 4.2 8.5
b 5.6 68 5.5 56
c 8.5 52 5.6 12

$

Thanks for the quick reply.
However this won't work because I don't have a header for each of the columns. I need to know for each column where the data came from.

Thanks!

You'll have to get that information from somewhere, and nothing in your post suggests where it does come from, so I think we need more information.

Sorry for not being clear.

I would like the output to be as so:

probe id   "FILENAMEA-info1"  "FILENAMEA-info2"  "FILENAMEB-info1"  "FILENAMEB-info2"
a                value                    value                  value
b                value                    value                  value
c                value                    value                  value

The first column would have the hearder "probeID"
2nd colum would have the filename+info as a header
3r column would have the filename+info as a header
and etc...

Does that make sense?
Thanks

Yes, I see what you want now, sorry for being dense.

Working on something.

$ cat jn.awk

BEGIN { OFS="\t";       }

F!=FILENAME {
        F=FILENAME;

        for(N=1; N<=NF; N++)    COL=COL OFS FILENAME"-info"N;
}

{
        D[$1]=D[$1] " " $0;
        if(!($1 in O))
        {
                O[++ORDER]=$1;
                O[$1]=1
        }
}

END {
        print substr(COL,2);
        for(N=1; N<=ORDER; N++)
        {
                $0=substr(D[O[N]], 2);
                $1=$1;
                print;
        }
}

$ awk -f jn.awk data1 data2

data1-info1     data1-info2     data1-info3     data2-info1     data2-info2    data2-info3
a       8.9     9.6     a       4.2     8.5
b       5.6     68      b       5.5     56
c       8.5     52      c       5.6     12

$

Perhaps not the most efficient but an all-in-one solution.

1 Like

Thanks.

It's almost there. The column that contains

a
b
c

does not need to be repeated.

is there a way to have that first column labeled "probe" for all the files? and then use a simple join command to join all the files together?

Thanks

$ cat header.sh

#!/bin/sh

FILES="$*"
COL="probe"

for FILE in $FILES
do
        N=1
        read LINE <"$FILE"
        set -- $LINE ; shift
        while [ "$#" -gt 0 ]
        do
                COL="$COL $FILE-$N"
                N=`expr $N + 1`
                shift
        done
done

echo $COL

join $FILES

$ ./header.sh data*

probe data1-1 data1-2 data2-1 data2-2
a 8.9 9.6 4.2 8.5
b 5.6 68 5.5 56
c 8.5 52 5.6 12

$