Creating a shortcut

kylle345 · June 28, 2012, 2:35am

Hi, I want to match a column of one file with many others and take the average of each one and put them into one file (I know sounds complicated).

so the 1st file is just a list of names that I want to match with the 2nd file that have names along with rows of values.

awk 'NR==FNR{a[$1];next}($1 in a){print}' 1st.txt 2nd.txt

However I also want to match the 1st file with file 3,4,5,6 etc. How can I write a code (or modify it to do that). Basically 1st file matching with 2md, 3rd, 4th etc...

After I get the match in a output file for files 2,3,4,5 etc., I want to then average it using this code

awk '{ M=NF; for(N=4; N<=NF; N++) T[N]+=$N } END {printf("%f", T[4]/NR);for(N=5; N<=M; N++) printf("\t%f", T[N]/NR);printf("\n");}'

Again I want to do it simultaneously for all files at once. After I want all the values along with the name of the initial files (2,3,4,5 etc) into one final output file along with the values.

Hope I did not confuse anyone..

I am currently dong this one by one and it is taking forever... I just want one file i the end with everything (along with names for each row).

Thanks

Scrutinizer · June 28, 2012, 3:34am

I think you could just specify more files after 2nd.txt

awk 'NR==FNR{a[$1];next}($1 in a){print}' 1st.txt 2nd.txt 3rd.txt 4th.txt

If the filenames allow you could use wildcards, e.g:

awk 'NR==FNR{a[$1];next}($1 in a){print}' match_file.txt file*.txt

kylle345 · June 28, 2012, 11:15am

Hi thanks for replying. Yes that definitely works and I am slowly getting to the final stage.

Here is the new code that I have. Yet again there are still problems but I think a minor tweek from you experts can solve it.

awk 'NR==FNR{a[$1];next}($1 in a){print}' 1st.txt *filmatch.txt | awk '{ M=NF; for(N=4; N<=NF; N++) T[N]+=$N } END {printf("%f", T[4]/NR);for(N=5; N<=M; N++) printf("\t%f", T[N]/NR);printf("\n");}' > output.txt

What the above currently does is match all files based on 1st.txt and puts them into ONE output file. *filmatch.txt is made up of numerous files (2.txt, 3.txt, 4.txt etc.) and I want to include their name in the final file.

right now the current output looks like this (basically just the values that I want averaged):

0.068808	0.067252	0.068956	0.068141	0.068563	0.069272	0.070322	0.070029	0.069015	0.071708	0.071292	0.069931	0.071829	0.070628	0.069996	0.071036	0.070910	0.071590

But I want it to look like this:

1st.txt 0.068808	0.067252	0.068956	0.068141	0.068563	0.069272	0.070322	0.070029	0.069015	0.071708	0.071292	0.069931	0.071829	0.070628	0.069996	0.071036	0.070910	0.071590

2nd.txt 0.068808	0.067252	0.068956	0.068141	0.068563	0.069272	0.070322	0.070029	0.069015	0.071708	0.071292	0.069931	0.071829	0.070628	0.069996	0.071036	0.070910	0.071590

etc...

If you can get this to work then it would be great.

Thanks

Scrutinizer · June 28, 2012, 11:19am

You could try something like this, which would print the filenames

awk 'NR==FNR{a[$1];next}($1 in a){print FILENAME $0}' 1st.txt *filmatch.txt

and take it from there...

kylle345 · June 28, 2012, 11:25am

Hi, thanks. That partly worked (1st step). The first column becomes fused to the filename with that code.

Now I need to average the rows that have the same filename.

Thanks