Multiple file merge by column

atzounis · February 11, 2013, 6:12am

Hello all,
I am quite new in linux shell scripting and I have this issue.
I ve got some files including measurements taken every 10minutes for a whole day.
File name format is:

00.00, 00.10, 00.20,....23.50

File structure is:

x   |  y  |  temperature

x and y is the same in all files (same sampling places). Only temperatures change.
What i want to do is merge all files in one file that will be like:

x |  y  |  temperature-filename 00.00 |  temerature-filename 00.10 | ... | temperature-filename 23.50

In other words, for fixed X and Y, to put all the temperatures one after the other incrementally and export everything in one file (not in shell because system may crash due to the number of files and data).
I tried awk, sed and grep scripts. What i find a bit tricky is to pass the file name in the script and create a loop that increases in [0-9][0-9].[0-9] [0-9] mode.
Any help would be highly appreciated. Thank you in advance.
Regards.
A.

pamu · February 11, 2013, 6:28am

Try sth like this..

 
awk -F "|" '{X[$1,$2]=X[$1,$2]?X[$1,$2] FS $3 "-" FILENAME : $0 "-" FILENAME}END{for(i in X){print X}}' *.txt

As *.txt give all your files as input.

atzounis · February 11, 2013, 6:49am

Hello and thanks a lot for your immediate reply.
It was my mistake to use "|" as separator because the actual separator is " " (single space) and files are .dat
Guess the code that does the work and has a file as an output is:

awk -F " " '{X[$1,$2]=X[$1,$2]?X[$1,$2] FS $3 "-" FILENAME : $0 "-" FILENAME}END{for(i in X){print X}}' *.dat >output.dat

Right?

---------- Post updated at 01:49 PM ---------- Previous update was at 01:43 PM ----------

Just tried it. It really works, apart from one thing: This code puts one file at the end of the other.
Example:
file1-00.00
x y temp
1 1 10

and file2-00.10
x y temp
1 1 12

Become file output
x y temp
file1-00.00
1 1 10
file2-00.10
1 1 12

I would prefer the output to be like
x y temp-file1-00.00 temp-file2-00.10 temp-fileXX-23.50
1 1 10 12 zz

pamu · February 11, 2013, 6:54am

just you need small modification...

tried on two files, hope it shoud work fo all...

 
$ awk 'FNR==1{X[$1,$2,FNR]=X[$1,$2,FNR]?X[$1,$2,FNR] FS $3 "-" FILENAME : $0 "-" FILENAME}
FNR>1{X[$1,$2,FNR]=X[$1,$2,FNR]?X[$1,$2,FNR] FS $3 : $0}
{Z[$1,$2]++}
END{for(j=1;j<=2;j++){for(i in Z){if(X[i,j]){print X[i,j]}}}}' file1 file2

x y temp-file1 temp-file2
1 1 10 12

atzounis · February 11, 2013, 9:06am

Nope... it is breaking somewhere...
I changed the file names in an easier pattern. Now they are all in 00.00.dat (00.00.dat, 00.10.dat, ... , 23.50.dat)
I attach a sample so that you can have a better view.
Thank you once again for your time!
Kind regards
A.

atzounis · February 13, 2013, 9:07am

There has to be something wrong about this code. I am still trying...
For the attached files. The output is:

 
1.00 163.00 292.3690
-00.10.dat 292.0391
-00.20.dat 291.8038
-00.30.dat
3.00 163.00 292.3675
 292.0370
 291.8015

In other words the script is not printing the x/y values and it only prints only the first price of each file (i took a sample of 3 files) in one column not in one row.
The output, according to these sample files should be like:

 x(m)   y(m) Pot. Temperature (K)-00.10.dat Pot. Temperature (K)-00.20.dat Pot. Temperature (K)-00.30.dat 
1.00 163.00 292.369 292.0391 291.8038
3.00 163.00 292.3675 292.037 291.8015
5.00 163.00 292.3661 292.0351 291.7992
7.00 163.00 292.3647 292.0332 291.7971
9.00 163.00 292.3634 292.0314 291.7951
11.00 163.00 292.3622 292.0297	291.7931
13.00 163.00 292.361 292.028 291.7911
15.00 163.00 292.3598 292.0264	291.7893
17.00 163.00 292.3587 292.0248	291.7874
19.00 163.00 292.3576 292.0232	291.7856
21.00 163.00 292.3565 292.0217	291.7838
23.00 163.00 292.3554 292.0201	291.782
25.00 163.00 292.3543 292.0186	291.7801
27.00 163.00 292.3532 292.0169	291.7783
29.00 163.00 292.3521 292.0153	291.7764
31.00 163.00 292.3508 292.0136	291.7744
33.00 163.00 292.3495 292.0118	291.7724

atzounis · February 13, 2013, 10:50am

I came up with this code:

 awk '{a[FNR]=a[FNR] FS $X;t=(FNR>T)?FNR:t}END {for (i=1;i<=t;i++) print a}' *.dat > output.txt

It works perfectly for extracting the first and second columns ( X=1 or =2) but for X = 3 it puts again the column one at the end of the other, not side by side. What am I missing?

pamu · February 14, 2013, 1:26am

Sorry for late reply..

But your previous sample and actual input file has lot discrepancies.

Try with this(not tested)

awk 'FNR==1{Y=Y?Y FS $0"-"FILENAME:$0"-"FILENAME;}
FNR>1{X[$1,$2]=X[$1,$2]?X[$1,$2] FS $3:$0;}
END{print Y;for(i in X){print X}}' *.dat

atzounis · February 14, 2013, 3:10am

HOORAY! we' re getting closer
This code seems to get a bit closer...
Look, I really appreciate your efforts on my problem! To make it easier, for me it would be just fine if I had an output like this:

 
00.00.dat 00.10.dat 00.20.dat 00.30.dat
292.9999 292.3690 292.0391 291.8038
292.9949 292.2965 291.9523 291.7151

This is the column i need one by the other under the file name. This code can then easily be pasted in a libreoffice calc document with the given X and Y's and that way I will be able to print graphs. I will be completely fine with this.
Thanks again. Have a nice day!

pamu · February 14, 2013, 3:34am

Try with this...

awk 'FNR==1{X[FNR]=X[FNR]?X[FNR] FS FILENAME:FILENAME;}
FNR>1{X[FNR]=X[FNR]?X[FNR] FS $3:$3;}
END{for(i=1;i<=FNR;i++){print X}}' *.dat

Hope this is what you want..

Regards,

pamu

atzounis · February 14, 2013, 3:54am

I had tried a code like this but it prints it like a new row with a whitespace charachter. Is it a separator problem?

00.00.dat 00.10.dat 00.20.dat 00.30.dat
292.9998
 292.3675
 292.0370
 291.8015
292.9998
 292.3661
 292.0351
 291.7992

---------- Post updated at 10:54 AM ---------- Previous update was at 10:39 AM ----------

This code does the trick, but only for $1 and $2...

awk '{a[FNR]=a[FNR] FS $3;t=(FNR>T)?FNR:t}END {for (i=1;i<=t;i++) print a}' *.dat

For $3 it prints all prices in a single column.

pamu · February 14, 2013, 4:30am

I am confused..

because there is an weird behavior for the values of $3 .

$3 has values like.. 292.9998 that's why it is failing.....

But i am not able to investigate why it is so

I m working on it hope i'll find some sure solution..

Regards,

pamu

atzounis · February 14, 2013, 4:33am

Thanks once again!