How to compare 2 files & get only few columns based on a condition related to both files?

Hiiiii friends

I have 2 files which contains huge data & few lines of it are as shown below

File1: b.dat(which has 21 columns)

   SSR 1976  8 12 13 10 44.00  39.0700  70.7800   7.0   0 0.00   0 2.78 0.00  0.00   0  0.00  2.78   0   NULL
   ISC 1976  8 12 22 32 37.39  36.2942  70.7338 225.0   7 0.00   0 0.00 0.00  0.00   0  4.20  4.20   0   NULL
   ISC 1976  8 12 23 26 47.09  26.6967  97.0421  31.0 326 6.20  79 0.00 5.90  6.10   0  0.00  6.20   0      7.99e+2
   SSR 1976  8 13  7 20 10.00  37.7300  69.3700   2.0   0 0.00   0 2.78 0.00  0.00   0  0.00  2.78   0   NULL
    ISC 1976 11 27 15 12 11.05  36.3106  71.3868 112.0  35 4.70   3 0.00 0.00  0.00   0  0.00  4.70   0   NULL
   PDE 1976 11 27 21 42 12.13  36.5171  71.0456 190.0 397 6.10  88 0.00 5.90  0.00   0  0.00  6.10   0     8.59e+24
    ISC 1977  1  1 21 39 43.99  38.1922  90.9651  43.0 304 5.80  73 0.00 6.00  6.30   0  0.00  6.30   0     1.43e+25
   SSR 1977  1  1 22 11 16.00  39.4000  73.0000   0.0   0 0.00   0 3.33 0.00  0.00   0  0.00  3.33   0   NULL
    ISC 1977  1 12 23 35 20.47   1.5668  99.8199 191.0 355 5.60  56 0.00 5.90  0.00   0  0.00  5.90   0     8.73e+24
   SSR 1977  1 18 20 16 33.00  36.5000  71.0000 160.0   0 0.00   0 3.33 0.00  0.00   0  0.00  3.33   0   NULL
   ISC 1977  1 19  0 46 15.76  37.0168  95.7286  15.0 377 5.80  89 0.00 5.80  5.80   0  0.00  5.80   0     7.25e+24
   SSR 1977  1 19  2 35 45.00  36.3000  70.3000 200.0   0 0.00   0 3.33 0.00  0.00   0  0.00  3.33   0   NULL
 

File2: a.dat (which has 11 columns)

1976  8 12 23 26 46.20  97.12  26.55  148  45    93
1976 11 27 21 42 12.20  70.64  36.33  294  41   139
1977  1  1 21 39 41.30  90.67  38.22  322  37   127
1977  1 12 23 35 19.10  99.50   1.08   92  46   132
1977  1 19  0 46 18.30  94.81  37.08  289  35    68

The output file say c.dat should contain few columns of a.dat & few columns of b.dat.
i.e. 9th, 10th & 11th columns from a.dat & 8th,9th & 19th column from b.dat based on a condition tat few colmns of a.dat must be equal to few columns of b.dat & only that rows which satisfies the condition these colmns should be selected
for example: Output file of a.dat & b.dat is as shown below
c.dat::confused::frowning:

26.6967  97.0421  2.78  148  45    93
36.5171  71.0456  6.10  294  41   139
38.1922  90.9651  6.30  322  37   127
 1.5668  99.8199  5.90   92  46   132
37.0168  95.7286  5.80  289  35    68

Now Tell me how to compare these 2 files such that
1)the 1st, 2nd, 3rd & 4th column of a .dat must be equal to 2nd 3rd 4th & 5th column of b.dat . AND then
2)From these rows from both files where the condition satisfies. The 8th, 9th & 19th column must be selected from b.dat & 9th,10th & 11th column must be selected from a .dat & these data should be stored in any other file say c.dat..
How is it. The example data for such condition is as shown above
Please help me out...:confused::frowning:

 awk 'NR==FNR {a[$1$2$3$4]=$9" "$10" "$11} NR>FNR&&a {print $8,$9,$19,a}' a.dat b.dat

Thanks a lot...Its working...
But if my data of b.dat is as follows

   PDE 2009  6  6  9  6 21.30  37.6300  21.0100  15.0   0 3.80   0 0.00 0.00  0.00   0  0.00  3.80   0   NULL
   PDE 2009  6  6  9 44 41.05  30.1900  86.2700  48.0   0 4.00   0 0.00 0.00  0.00   0  0.00  4.00   0   NULL
 

&
a.dat is as follows

2009  6  6  9 44 37.00  86.43  30.99  218  76    -9

Then how to get out put as

30.1900  86.2700  4.00  218  76    -9

I mean to say It must just compare the next field of a.dat only in such lines & get the result.:slight_smile:

so how do you get:

36.1900 70.2700

Sorry Tying mistak..I have corrected it now check it out

 awk 'NR==FNR {a[$1$2$3$4$5]=$9" "$10" "$11} NR>FNR&&a$6] {print $8,$9,$19,a}' a.dat b.dat

I tried to do like this already..It deletes my other data..
for example
a.dat:

2009 10 29 17 44 31.00  70.66  36.40  286  24    76
2009 11  7 20  9  4.00  86.22  29.29  172  43  -105
2009 11 10  2 48 46.00  91.86   8.08  123  78   169

b.dat:

  PDE-Q 2009 10 29 17  0 38.84  27.2600  91.3800  26.0   0 5.20   0 0.00 0.00  0.00   0  0.00  5.20   0   NULL
 PDE-Q 2009 10 29 17 44 31.75  36.4300  70.7300 205.0   0 6.20   0 0.00 0.00  0.00   0  0.00  6.20   0   NULL
 PDE-Q 2009 11  7 20  8 46.72  29.5200  86.0600   7.0   0 5.60   0 0.00 0.00  0.00   0  0.00  5.60   0   NULL
 PDE-Q 2009 11 10  2 48 46.87   8.0800  91.8900  23.0   0 6.00   0 0.00 0.00  0.00   0  0.00  6.00   0   NULL

The correct output i should get is
c.dat:

36.4300  70.7300   6.20 286 24 76
29.5200  86.0600  5.60 172  43  -105
8.0800 91.8900 6.00 123 78 169

but if i use your code i get output for such lines as

36.4300 70.7300 6.20 286 24 76
8.0800 91.8900 6.00 123 78 169

&
if i use your first code which is thier i get output as

27.2600 91.3800 5.20 286 24 76
36.4300 70.7300 6.20 286 24 76
29.5200 86.0600 5.60 172 43 -105
8.0800 91.8900 6.00 123 78 169

:slight_smile: